Overview
The CESNET-QUIC22 dataset contains four consecutive weeks of QUIC traffic collected in November 2022. The dataset comprises 153 million network flows in total, with annotations based on TLS SNI (Server Name Indication) domains. Extracting SNI domains from connection handshakes is more complicated in QUIC than in TLS due to handshake obfuscation.
Dataset Metadata
| Property | Value |
|---|---|
| Type | Original dataset |
| Category | Flows |
| Primary Task | Service Classification |
| Total Flows | 153 million |
| Total Size (Uncompressed) | 89 GB |
| Collection Period | 31 October 2022 – 27 November 2022 |
| Number of Classes | 102 web service classes + 3 background classes |
Dataset Characteristics
- Annotation Method: Based on TLS SNI domains extracted from QUIC connection handshakes
- Service Organization: Web services are organized into provider-based groups:
- Google services: 27 distinct services
- Meta services: 9 distinct services
- Background Classes: 3 classes for non-web service traffic
Dataset Statistics
The following table provides detailed per-week statistics for the CESNET-QUIC22 dataset:
| Week Name | Uncompressed Size | Collection Period | Flow Count |
|---|---|---|---|
| W-2022-44 | 19 GB | 31 Oct 2022 – 6 Nov 2022 | 32.6M |
| W-2022-45 | 25 GB | 7 Nov 2022 – 13 Nov 2022 | 42.6M |
| W-2022-46 | 20 GB | 14 Nov 2022 – 20 Nov 2022 | 33.7M |
| W-2022-47 | 25 GB | 21 Nov 2022 – 27 Nov 2022 | 44.1M |
| CESNET-QUIC22 (total) | 89 GB | 31 Oct 2022 – 27 Nov 2022 | 153M |
Research Applications
The CESNET-QUIC22 dataset’s longer time span enables research on several deployment-related issues:
- Data Drift Analysis: Study temporal changes in QUIC traffic patterns
- Classifier Performance Degradation: Evaluate how classifier accuracy evolves over time
- Domain-Level Classification: Each flow includes the extracted SNI domain, enabling more challenging domain-level classification tasks
- Large-Scale QUIC Analysis: To the best of our knowledge, no other public QUIC flow-level dataset of comparable size exists
Comparison with Other Datasets
The closest comparable dataset is the UC Davis QUIC dataset [2], which contains only 6,500 flows across five classes, making CESNET-QUIC22 significantly larger and more comprehensive.
How to Cite
@article{luxemburk2023cesnet,
title={CESNET-QUIC22: A Large One-month QUIC Network Traffic Dataset from Backbone Lines},
author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and Luka{\v{c}}ovi{\v{c}}, Andrej and {\v{S}}i{\v{s}}ka, Pavel},
journal={Data in Brief},
volume={46},
pages={108888},
year={2023},
publisher={Elsevier}
}
Download
[1] Luxemburk, J., Hynek, K., Čejka, T., Lukačovič, A., & Šiška, P. (2023). CESNET-QUIC22: A Large One-month QUIC Network Traffic Dataset from Backbone Lines [Data set]. Zenodo.
Download link to be added
References
[2] UC Davis QUIC Dataset (6.5K flows, 5 classes)
Reference to be added