Overview

The CESNET-TLS-Year22 dataset is an extensive year-long collection of TLS-encrypted traffic containing 507 million flows across 180 web services. The dataset is annotated using the same process as CESNET-TLS22 and CESNET-QUIC22, providing a comprehensive resource for studying traffic classification over extended time periods.

Dataset Metadata

PropertyValue
TypeOriginal dataset
CategoryFlows
Primary TaskService Classification
Total Flows507 million
Collection PeriodOne year (2022)
Number of Classes180 web service classes
Sampling Rate1:10 uniform sampling
Annotation MethodTLS SNI domains
Background ClassesNone (web services only)

Dataset Characteristics

Sampling Strategy

Due to the substantial volume of TLS traffic collected over a full year, uniform sampling was applied:

  • Sampling Ratio: 1:10
  • Purpose: Make dataset size manageable while preserving temporal patterns
  • Method: Uniform random sampling across all time periods
  • Preservation: Maintains representative distribution of traffic patterns

Included Data

Each flow record includes:

  • Flow Features: Standard IP flow attributes (bytes, packets, duration, etc.)
  • SNI Domain: Extracted Server Name Indication from TLS handshake
  • Timestamp: Precise capture time for temporal analysis
  • Service Label: Annotated web service class (180 classes)

Note: Unlike CESNET-QUIC22, this dataset does not contain background traffic classes—all flows are labeled web services.

Research Applications

The CESNET-TLS-Year22 dataset’s year-long span makes it particularly suitable for:

1. Traffic Classification Robustness

  • Temporal Stability: Evaluate classifier performance over extended periods
  • Seasonal Patterns: Study how traffic changes across months and seasons
  • Evolving Behaviors: Assess impact of service updates and protocol changes

2. Model Retraining Strategies

  • Retraining Frequency: Determine optimal model update intervals
  • Incremental Learning: Test continuous learning approaches
  • Transfer Learning: Evaluate knowledge transfer across time periods

3. Incremental Class Learning

  • New Class Addition: Add emerging services without full retraining
  • Class Evolution: Handle changes in existing service behaviors
  • Memory Efficiency: Minimize computational requirements for updates

4. Few-shot Learning

  • Underrepresented Classes: Improve performance on rare services
  • Emerging Applications: Classify new services with limited samples
  • Cold-start Problem: Handle newly deployed web services
DatasetDurationFlowsClassesBackground Traffic
CESNET-TLS222 weeks141.7M191No
CESNET-QUIC224 weeks153M102 + 3 backgroundYes
CESNET-TLS-Year221 year507M180No

How to Cite

@article{luxemburk2024long,
  title={Long-term Traffic Classification using Self-supervised Representation Learning},
  author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}}},
  journal={Computer Networks},
  year={2024},
  publisher={Elsevier}
}

Download

[1] Luxemburk, J., Hynek, K., & Čejka, T. (2024). CESNET-TLS-Year22: One-Year TLS Network Traffic Dataset [Data set]. Zenodo.
Download link to be added