CESNET-TLS-Year22

Overview

The CESNET-TLS-Year22 dataset is an extensive year-long collection of TLS-encrypted traffic containing 507 million flows across 180 web services. The dataset is annotated using the same process as CESNET-TLS22 and CESNET-QUIC22, providing a comprehensive resource for studying traffic classification over extended time periods.

Dataset Metadata

Property	Value
Type	Original dataset
Category	Flows
Primary Task	Service Classification
Total Flows	507 million
Collection Period	One year (2022)
Number of Classes	180 web service classes
Sampling Rate	1:10 uniform sampling
Annotation Method	TLS SNI domains
Background Classes	None (web services only)

Dataset Characteristics

Sampling Strategy

Due to the substantial volume of TLS traffic collected over a full year, uniform sampling was applied:

Sampling Ratio: 1:10
Purpose: Make dataset size manageable while preserving temporal patterns
Method: Uniform random sampling across all time periods
Preservation: Maintains representative distribution of traffic patterns

Included Data

Each flow record includes:

Flow Features: Standard IP flow attributes (bytes, packets, duration, etc.)
SNI Domain: Extracted Server Name Indication from TLS handshake
Timestamp: Precise capture time for temporal analysis
Service Label: Annotated web service class (180 classes)

Note: Unlike CESNET-QUIC22, this dataset does not contain background traffic classes—all flows are labeled web services.

Research Applications

The CESNET-TLS-Year22 dataset’s year-long span makes it particularly suitable for:

1. Traffic Classification Robustness

Temporal Stability: Evaluate classifier performance over extended periods
Seasonal Patterns: Study how traffic changes across months and seasons
Evolving Behaviors: Assess impact of service updates and protocol changes

2. Model Retraining Strategies

Retraining Frequency: Determine optimal model update intervals
Incremental Learning: Test continuous learning approaches
Transfer Learning: Evaluate knowledge transfer across time periods

3. Incremental Class Learning

New Class Addition: Add emerging services without full retraining
Class Evolution: Handle changes in existing service behaviors
Memory Efficiency: Minimize computational requirements for updates

4. Few-shot Learning

Underrepresented Classes: Improve performance on rare services
Emerging Applications: Classify new services with limited samples
Cold-start Problem: Handle newly deployed web services

Dataset	Duration	Flows	Classes	Background Traffic
CESNET-TLS22	2 weeks	141.7M	191	No
CESNET-QUIC22	4 weeks	153M	102 + 3 background	Yes
CESNET-TLS-Year22	1 year	507M	180	No

How to Cite

@article{luxemburk2024long,
  title={Long-term Traffic Classification using Self-supervised Representation Learning},
  author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}}},
  journal={Computer Networks},
  year={2024},
  publisher={Elsevier}
}

Download

[1] Luxemburk, J., Hynek, K., & Čejka, T. (2024). CESNET-TLS-Year22: One-Year TLS Network Traffic Dataset [Data set]. Zenodo.
Download link to be added

Overview#

Dataset Metadata#

Dataset Characteristics#

Sampling Strategy#

Included Data#

Research Applications#

1. Traffic Classification Robustness#

2. Model Retraining Strategies#

3. Incremental Class Learning#

4. Few-shot Learning#

Comparison with Related Datasets#

How to Cite#

Download#