Overview
The CESNET-DeviceType24 dataset [1] was created from the CESNET-TimeSeries-24 dataset [2], which captures traffic from the CESNET network. This dataset is specifically designed for device type classification based on time-series behavior analysis.
Dataset Metadata
| Property | Value |
|---|---|
| Type | Recreated dataset |
| Category | Time Series |
| Primary Task | Device Type Detection |
| Source Dataset | CESNET-TimeSeries-24 |
| Time Span | 40 weeks (October 2023 – July 2024) |
| Annotated Devices | 82,504 labeled devices |
| Number of Classes | 3 (end-device, net-device, server) |
Source Dataset Characteristics
CESNET-TimeSeries-24 Scope
A standout feature of the source CESNET-TimeSeries-24 dataset is its breadth and depth:
- Overall network traffic across the entire CESNET network
- 297 institutions with individual time series
- 610 institutional subnets tracked separately
- 270,000+ individual IP addresses monitored
This extensive range provides a robust basis for comparative analysis of neural network models, enabling researchers to benchmark forecasting performance across multiple hierarchical levels within the network. Spanning a substantial 40-week period, the dataset captures both long-term trends and fine-grained fluctuations, offering an invaluable resource for rigorous neural network model assessment.
Annotation Methodology
The annotation process employed a semi-automated approach:
- Initial Annotation: Based on prior knowledge of the CESNET3 network infrastructure and connected devices
- Extended Annotation: Unknown IP addresses were annotated using:
- Reverse DNS lookups
- Queries to the Shodan platform
- Limitations: This approach does not allow reliable annotation of all captured devices within the CESNET3 network
The dataset annotations are also published in the open-source tool CESNET TS-Zoo [3].
Class Distribution
A total of 82,504 devices were reliably labeled and placed into three classes:
| Class | Device Count | Percentage | Description |
|---|---|---|---|
| end-device | 72,523 | 87.9% | User devices and NATs |
| server | 7,875 | 9.5% | Server infrastructure |
| net-device | 2,106 | 2.6% | Network equipment |
As expected, the majority class is end-device, which covers both user devices and Network Address Translation (NAT) systems.
Dataset Splits
The dataset uses temporal splitting to enable assessment of model generalization and stability on future data, addressing the common problem of data drift in network monitoring:
| Split | Duration | Sample Count | Purpose |
|---|---|---|---|
| Training | First 26 weeks | 2,401,854 | Model training |
| Validation | Next 2 weeks | 184,758 | Hyperparameter tuning |
| Test | Final 12 weeks | 1,108,548 | Performance evaluation |
Benefits of Long Test Period
The extended 12-week test set enables:
- Multi-week Performance Evaluation: Assess model performance across multiple weeks
- Data Drift Detection: Evaluate whether performance degradation occurs over time
- Stability Analysis: Determine model robustness to evolving network patterns
How to Cite
@article{koumar2025cesnet,
title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
journal={Scientific Data},
volume={12},
number={1},
pages={338},
year={2025},
publisher={Nature Publishing Group UK London}
}
Download
[1] Mudruňka, K., Koumar, J., & Jeřábek, K. (2025). CESNET-DeviceType24: Dataset for Device Type Classification on ISP Network [Data set]. Zenodo.
DOI: 10.5281/zenodo.17542827
References
[2] Koumar, J., Hynek, K., Čejka, T., & Šiška, P. (2025). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Scientific Data, 12(1), 338.
[3] Kureš, M., Koumar, J., & Hynek, K. (2025, October). CESNET TS-Zoo: A Library for Reproducible Analysis of Network Traffic Time Series. In 2025 21st International Conference on Network and Service Management (CNSM) (pp. 1-5). IEEE.