Overview
The CESNET-TLS22 dataset captures two weeks of TLS-encrypted traffic from the CESNET3 network, containing 141.7 million flows across 191 web service classes. The dataset is annotated based on TLS SNI (Server Name Indication) domains and provides realistic traffic characteristics from a production ISP environment.
Dataset Metadata
| Property | Value |
|---|---|
| Type | Original dataset |
| Category | Flows |
| Primary Task | Service Classification |
| Total Flows | 141.7 million |
| Total Size (Uncompressed) | 42 GB |
| Collection Period | 4 October 2021 – 17 October 2021 |
| Number of Classes | 191 web service classes |
| Annotation Method | TLS SNI domains |
Dataset Characteristics
Realistic Traffic Capture
The CESNET3 network environment provides realistic traffic characteristics:
Diverse Client Platforms:
- Various web browsers (Chrome, Firefox, Safari, Edge, etc.)
- Multiple operating systems (Windows, macOS, Linux, mobile OS)
- Both desktop and mobile devices
Protocol Coverage:
- HTTP/1.1 traffic
- HTTP/2 traffic
- Mixed protocol scenarios
Organic Traffic Patterns:
- Natural user behaviors and interactions
- Diverse service settings and configurations
- Real-world usage patterns
Dataset Statistics
Detailed per-week statistics for the CESNET-TLS22 dataset:
| Week Name | Uncompressed Size | Collection Period | Flow Count |
|---|---|---|---|
| W-2021-40 | 22 GB | 4 Oct 2021 – 10 Oct 2021 | 73.2M |
| W-2021-41 | 20 GB | 11 Oct 2021 – 17 Oct 2021 | 68.5M |
| CESNET-TLS22 (total) | 42 GB | 4 Oct 2021 – 17 Oct 2021 | 141.7M |
Research Applications
The dataset was designed to support research in:
- Fine-grained Classification: Distinguishing between 191 different web services
- Open-world Classification: Classifier can reject anomalous and unknown samples
- Encrypted Traffic Analysis: Working with TLS-encrypted flows using SNI
- Realistic Performance Evaluation: Testing on production network traffic
Open-World Setting
The CESNET-TLS22 dataset enables open-world classification scenarios where:
- Classifiers must handle known service classes
- Classifiers can reject anomalous samples
- Classifiers can identify unknown/novel services
- Performance is evaluated on realistic traffic distributions
How to Cite
@article{luxemburk2023fine,
title={Fine-grained TLS Services Classification with Reject Option},
author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and Pe{\v{s}}ek, Jaroslav},
journal={Computer Networks},
volume={220},
pages={109467},
year={2023},
publisher={Elsevier}
}
Download
[1] Luxemburk, J., Hynek, K., Čejka, T., & Pešek, J. (2023). CESNET-TLS22: Two-Week TLS Network Traffic Dataset from Backbone Lines [Data set]. Zenodo.
Download link to be added