Overview

The CESNET-TLS22 dataset captures two weeks of TLS-encrypted traffic from the CESNET3 network, containing 141.7 million flows across 191 web service classes. The dataset is annotated based on TLS SNI (Server Name Indication) domains and provides realistic traffic characteristics from a production ISP environment.

Dataset Metadata

PropertyValue
TypeOriginal dataset
CategoryFlows
Primary TaskService Classification
Total Flows141.7 million
Total Size (Uncompressed)42 GB
Collection Period4 October 2021 – 17 October 2021
Number of Classes191 web service classes
Annotation MethodTLS SNI domains

Dataset Characteristics

Realistic Traffic Capture

The CESNET3 network environment provides realistic traffic characteristics:

  • Diverse Client Platforms:

    • Various web browsers (Chrome, Firefox, Safari, Edge, etc.)
    • Multiple operating systems (Windows, macOS, Linux, mobile OS)
    • Both desktop and mobile devices
  • Protocol Coverage:

    • HTTP/1.1 traffic
    • HTTP/2 traffic
    • Mixed protocol scenarios
  • Organic Traffic Patterns:

    • Natural user behaviors and interactions
    • Diverse service settings and configurations
    • Real-world usage patterns

Dataset Statistics

Detailed per-week statistics for the CESNET-TLS22 dataset:

Week NameUncompressed SizeCollection PeriodFlow Count
W-2021-4022 GB4 Oct 2021 – 10 Oct 202173.2M
W-2021-4120 GB11 Oct 2021 – 17 Oct 202168.5M
CESNET-TLS22 (total)42 GB4 Oct 2021 – 17 Oct 2021141.7M

Research Applications

The dataset was designed to support research in:

  • Fine-grained Classification: Distinguishing between 191 different web services
  • Open-world Classification: Classifier can reject anomalous and unknown samples
  • Encrypted Traffic Analysis: Working with TLS-encrypted flows using SNI
  • Realistic Performance Evaluation: Testing on production network traffic

Open-World Setting

The CESNET-TLS22 dataset enables open-world classification scenarios where:

  • Classifiers must handle known service classes
  • Classifiers can reject anomalous samples
  • Classifiers can identify unknown/novel services
  • Performance is evaluated on realistic traffic distributions

How to Cite

@article{luxemburk2023fine,
  title={Fine-grained TLS Services Classification with Reject Option},
  author={Luxemburk, Jan and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and Pe{\v{s}}ek, Jaroslav},
  journal={Computer Networks},
  volume={220},
  pages={109467},
  year={2023},
  publisher={Elsevier}
}

Download

[1] Luxemburk, J., Hynek, K., Čejka, T., & Pešek, J. (2023). CESNET-TLS22: Two-Week TLS Network Traffic Dataset from Backbone Lines [Data set]. Zenodo.
Download link to be added