Overview
This collection of datasets [1] was created from 15 well-known published datasets, covering the most important traffic detection (binary) and classification (multiclass) tasks. Each dataset contains the NetTiSA flow feature vector, a novel extended IP flow format designed for universal bandwidth-constrained high-speed network traffic classification.
Dataset Metadata
| Property | Value |
|---|---|
| Type | Recreated dataset |
| Category | Flows |
| Size (Compressed) | — |
| Size (Decompressed) | 12.6 GB |
| Toolset | — |
| Source Datasets | 15 published datasets |
Supported Tasks
The datasets support the following network traffic analysis tasks:
- Botnet detection and classification
- Cryptomining detection
- DNS malware detection
- DNS over HTTPS (DoH) detection
- DoS attack detection
- HTTPS Bruteforce detection
- Intrusion detection and classification
- IoT malware classification
- TOR traffic detection and classification
- VPN traffic detection and classification
NetTiSA Flow Feature Vector
Overview
The NetTiSA (Network Time Series Analysed) flow is a novel extended IP flow format containing a universal bandwidth-constrained feature vector of 20 features. This compact representation enables effective traffic classification while minimizing telemetry bandwidth requirements.
Feature Groups
NetTiSA flow classification features are organized into three groups based on their computation method:
| Feature Group | Description | Examples |
|---|---|---|
| Group 1: Classical Flow Features | Based on traditional bidirectional flow information | Number of transferred bytes, packet counts |
| Group 2: Time-Series Features | Statistical and temporal features derived from packet sequences | Statistical moments, inter-arrival times, payload patterns |
| Group 3: Derived Features | Computed from Groups 1 and 2 on the flow collector | Computed ratios, normalized values |
Key Advantages
- Bandwidth-Constrained: Only 20 features minimize telemetry overhead
- Universal Applicability: Effective across multiple traffic classification tasks
- High-Speed Compatibility: Designed for high-speed network monitoring
- Improved Performance: Group 3 features enhance classification without increasing bandwidth
Feature Computation
The three-tier feature architecture enables efficient distributed processing:
- Network Probe: Computes Groups 1 and 2 features from raw packets
- Flow Collector: Derives Group 3 features from received flow records
- Classification System: Uses all 20 features for traffic analysis
This design minimizes bandwidth between the probe and collector while maintaining classification accuracy.
How to Cite
@article{KOUMAR2024110147,
title = {NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification},
journal = {Computer Networks},
volume = {240},
pages = {110147},
year = {2024},
issn = {1389-1286},
doi = {https://doi.org/10.1016/j.comnet.2023.110147},
url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923},
author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka}
}
Download
[1] Josef Koumar, Karel Hynek, Jaroslav Pešek, & Tomáš Čejka. (2023). Network Traffic Datasets with Novel Extended IP Flow Called NetTiSA Flow [Data set]. Zenodo.
DOI: 10.5281/zenodo.8301043