Overview

The CESNET-QUIC22 dataset contains four consecutive weeks of QUIC traffic collected in November 2022. The dataset comprises 153 million network flows in total, with annotations based on TLS SNI (Server Name Indication) domains. Extracting SNI domains from connection handshakes is more complicated in QUIC than in TLS due to handshake obfuscation.

Dataset Metadata

PropertyValue
TypeOriginal dataset
CategoryFlows
Primary TaskService Classification
Total Flows153 million
Total Size (Uncompressed)89 GB
Collection Period31 October 2022 – 27 November 2022
Number of Classes102 web service classes + 3 background classes

Dataset Characteristics

  • Annotation Method: Based on TLS SNI domains extracted from QUIC connection handshakes
  • Service Organization: Web services are organized into provider-based groups:
    • Google services: 27 distinct services
    • Meta services: 9 distinct services
  • Background Classes: 3 classes for non-web service traffic

Dataset Statistics

The following table provides detailed per-week statistics for the CESNET-QUIC22 dataset:

Week NameUncompressed SizeCollection PeriodFlow Count
W-2022-4419 GB31 Oct 2022 – 6 Nov 202232.6M
W-2022-4525 GB7 Nov 2022 – 13 Nov 202242.6M
W-2022-4620 GB14 Nov 2022 – 20 Nov 202233.7M
W-2022-4725 GB21 Nov 2022 – 27 Nov 202244.1M
CESNET-QUIC22 (total)89 GB31 Oct 2022 – 27 Nov 2022153M

Research Applications

The CESNET-QUIC22 dataset’s longer time span enables research on several deployment-related issues:

  • Data Drift Analysis: Study temporal changes in QUIC traffic patterns
  • Classifier Performance Degradation: Evaluate how classifier accuracy evolves over time
  • Domain-Level Classification: Each flow includes the extracted SNI domain, enabling more challenging domain-level classification tasks
  • Large-Scale QUIC Analysis: To the best of our knowledge, no other public QUIC flow-level dataset of comparable size exists

Comparison with Other Datasets

The closest comparable dataset is the UC Davis QUIC dataset [2], which contains only 6,500 flows across five classes, making CESNET-QUIC22 significantly larger and more comprehensive.

How to Cite

@article{luxemburk2023cesnet,
  title={CESNET-QUIC22: A Large One-month QUIC Network Traffic Dataset from Backbone Lines},
  author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and Luka{\v{c}}ovi{\v{c}}, Andrej and {\v{S}}i{\v{s}}ka, Pavel},
  journal={Data in Brief},
  volume={46},
  pages={108888},
  year={2023},
  publisher={Elsevier}
}

Download

[1] Luxemburk, J., Hynek, K., Čejka, T., Lukačovič, A., & Šiška, P. (2023). CESNET-QUIC22: A Large One-month QUIC Network Traffic Dataset from Backbone Lines [Data set]. Zenodo.
Download link to be added

References

[2] UC Davis QUIC Dataset (6.5K flows, 5 classes)
Reference to be added