Overview

This collection of datasets [1] was created from 15 well-known published datasets, covering the most important traffic detection (binary) and classification (multiclass) tasks. Each dataset contains the NetTiSA flow feature vector, a novel extended IP flow format designed for universal bandwidth-constrained high-speed network traffic classification.

Dataset Metadata

PropertyValue
TypeRecreated dataset
CategoryFlows
Size (Compressed)
Size (Decompressed)12.6 GB
Toolset
Source Datasets15 published datasets

Supported Tasks

The datasets support the following network traffic analysis tasks:

  • Botnet detection and classification
  • Cryptomining detection
  • DNS malware detection
  • DNS over HTTPS (DoH) detection
  • DoS attack detection
  • HTTPS Bruteforce detection
  • Intrusion detection and classification
  • IoT malware classification
  • TOR traffic detection and classification
  • VPN traffic detection and classification

NetTiSA Flow Feature Vector

Overview

The NetTiSA (Network Time Series Analysed) flow is a novel extended IP flow format containing a universal bandwidth-constrained feature vector of 20 features. This compact representation enables effective traffic classification while minimizing telemetry bandwidth requirements.

Feature Groups

NetTiSA flow classification features are organized into three groups based on their computation method:

Feature GroupDescriptionExamples
Group 1: Classical Flow FeaturesBased on traditional bidirectional flow informationNumber of transferred bytes, packet counts
Group 2: Time-Series FeaturesStatistical and temporal features derived from packet sequencesStatistical moments, inter-arrival times, payload patterns
Group 3: Derived FeaturesComputed from Groups 1 and 2 on the flow collectorComputed ratios, normalized values

Key Advantages

  • Bandwidth-Constrained: Only 20 features minimize telemetry overhead
  • Universal Applicability: Effective across multiple traffic classification tasks
  • High-Speed Compatibility: Designed for high-speed network monitoring
  • Improved Performance: Group 3 features enhance classification without increasing bandwidth

Feature Computation

The three-tier feature architecture enables efficient distributed processing:

  1. Network Probe: Computes Groups 1 and 2 features from raw packets
  2. Flow Collector: Derives Group 3 features from received flow records
  3. Classification System: Uses all 20 features for traffic analysis

This design minimizes bandwidth between the probe and collector while maintaining classification accuracy.

How to Cite

@article{KOUMAR2024110147,
  title = {NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification},
  journal = {Computer Networks},
  volume = {240},
  pages = {110147},
  year = {2024},
  issn = {1389-1286},
  doi = {https://doi.org/10.1016/j.comnet.2023.110147},
  url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923},
  author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka}
}

Download

[1] Josef Koumar, Karel Hynek, Jaroslav Pešek, & Tomáš Čejka. (2023). Network Traffic Datasets with Novel Extended IP Flow Called NetTiSA Flow [Data set]. Zenodo.
DOI: 10.5281/zenodo.8301043