Overview

The CESNET-DeviceType24 dataset [1] was created from the CESNET-TimeSeries-24 dataset [2], which captures traffic from the CESNET network. This dataset is specifically designed for device type classification based on time-series behavior analysis.

Dataset Metadata

PropertyValue
TypeRecreated dataset
CategoryTime Series
Primary TaskDevice Type Detection
Source DatasetCESNET-TimeSeries-24
Time Span40 weeks (October 2023 – July 2024)
Annotated Devices82,504 labeled devices
Number of Classes3 (end-device, net-device, server)

Source Dataset Characteristics

CESNET-TimeSeries-24 Scope

A standout feature of the source CESNET-TimeSeries-24 dataset is its breadth and depth:

  • Overall network traffic across the entire CESNET network
  • 297 institutions with individual time series
  • 610 institutional subnets tracked separately
  • 270,000+ individual IP addresses monitored

This extensive range provides a robust basis for comparative analysis of neural network models, enabling researchers to benchmark forecasting performance across multiple hierarchical levels within the network. Spanning a substantial 40-week period, the dataset captures both long-term trends and fine-grained fluctuations, offering an invaluable resource for rigorous neural network model assessment.

Annotation Methodology

The annotation process employed a semi-automated approach:

  1. Initial Annotation: Based on prior knowledge of the CESNET3 network infrastructure and connected devices
  2. Extended Annotation: Unknown IP addresses were annotated using:
    • Reverse DNS lookups
    • Queries to the Shodan platform
  3. Limitations: This approach does not allow reliable annotation of all captured devices within the CESNET3 network

The dataset annotations are also published in the open-source tool CESNET TS-Zoo [3].

Class Distribution

A total of 82,504 devices were reliably labeled and placed into three classes:

ClassDevice CountPercentageDescription
end-device72,52387.9%User devices and NATs
server7,8759.5%Server infrastructure
net-device2,1062.6%Network equipment

As expected, the majority class is end-device, which covers both user devices and Network Address Translation (NAT) systems.

Dataset Splits

The dataset uses temporal splitting to enable assessment of model generalization and stability on future data, addressing the common problem of data drift in network monitoring:

SplitDurationSample CountPurpose
TrainingFirst 26 weeks2,401,854Model training
ValidationNext 2 weeks184,758Hyperparameter tuning
TestFinal 12 weeks1,108,548Performance evaluation

Benefits of Long Test Period

The extended 12-week test set enables:

  • Multi-week Performance Evaluation: Assess model performance across multiple weeks
  • Data Drift Detection: Evaluate whether performance degradation occurs over time
  • Stability Analysis: Determine model robustness to evolving network patterns

How to Cite

@article{koumar2025cesnet,
  title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
  author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
  journal={Scientific Data},
  volume={12},
  number={1},
  pages={338},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

Download

[1] Mudruňka, K., Koumar, J., & Jeřábek, K. (2025). CESNET-DeviceType24: Dataset for Device Type Classification on ISP Network [Data set]. Zenodo.
DOI: 10.5281/zenodo.17542827

References

[2] Koumar, J., Hynek, K., Čejka, T., & Šiška, P. (2025). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Scientific Data, 12(1), 338.

[3] Kureš, M., Koumar, J., & Hynek, K. (2025, October). CESNET TS-Zoo: A Library for Reproducible Analysis of Network Traffic Time Series. In 2025 21st International Conference on Network and Service Management (CNSM) (pp. 1-5). IEEE.