Overview
The CESNET-TLS-Year22 dataset is an extensive year-long collection of TLS-encrypted traffic containing 507 million flows across 180 web services. The dataset is annotated using the same process as CESNET-TLS22 and CESNET-QUIC22, providing a comprehensive resource for studying traffic classification over extended time periods.
Dataset Metadata
| Property | Value |
|---|---|
| Type | Original dataset |
| Category | Flows |
| Primary Task | Service Classification |
| Total Flows | 507 million |
| Collection Period | One year (2022) |
| Number of Classes | 180 web service classes |
| Sampling Rate | 1:10 uniform sampling |
| Annotation Method | TLS SNI domains |
| Background Classes | None (web services only) |
Dataset Characteristics
Sampling Strategy
Due to the substantial volume of TLS traffic collected over a full year, uniform sampling was applied:
- Sampling Ratio: 1:10
- Purpose: Make dataset size manageable while preserving temporal patterns
- Method: Uniform random sampling across all time periods
- Preservation: Maintains representative distribution of traffic patterns
Included Data
Each flow record includes:
- Flow Features: Standard IP flow attributes (bytes, packets, duration, etc.)
- SNI Domain: Extracted Server Name Indication from TLS handshake
- Timestamp: Precise capture time for temporal analysis
- Service Label: Annotated web service class (180 classes)
Note: Unlike CESNET-QUIC22, this dataset does not contain background traffic classes—all flows are labeled web services.
Research Applications
The CESNET-TLS-Year22 dataset’s year-long span makes it particularly suitable for:
1. Traffic Classification Robustness
- Temporal Stability: Evaluate classifier performance over extended periods
- Seasonal Patterns: Study how traffic changes across months and seasons
- Evolving Behaviors: Assess impact of service updates and protocol changes
2. Model Retraining Strategies
- Retraining Frequency: Determine optimal model update intervals
- Incremental Learning: Test continuous learning approaches
- Transfer Learning: Evaluate knowledge transfer across time periods
3. Incremental Class Learning
- New Class Addition: Add emerging services without full retraining
- Class Evolution: Handle changes in existing service behaviors
- Memory Efficiency: Minimize computational requirements for updates
4. Few-shot Learning
- Underrepresented Classes: Improve performance on rare services
- Emerging Applications: Classify new services with limited samples
- Cold-start Problem: Handle newly deployed web services
Comparison with Related Datasets
| Dataset | Duration | Flows | Classes | Background Traffic |
|---|---|---|---|---|
| CESNET-TLS22 | 2 weeks | 141.7M | 191 | No |
| CESNET-QUIC22 | 4 weeks | 153M | 102 + 3 background | Yes |
| CESNET-TLS-Year22 | 1 year | 507M | 180 | No |
How to Cite
@article{luxemburk2024long,
title={Long-term Traffic Classification using Self-supervised Representation Learning},
author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}}},
journal={Computer Networks},
year={2024},
publisher={Elsevier}
}
Download
[1] Luxemburk, J., Hynek, K., & Čejka, T. (2024). CESNET-TLS-Year22: One-Year TLS Network Traffic Dataset [Data set]. Zenodo.
Download link to be added