NetFlow Data Artifacts
In the paper 'Measurement Artifacts in NetFlow Data' we have analyzed the presence and impact of measurement artifacts in NetFlow data from six flow exporters. Several Python scripts have been used to generate data, such that the artifacts (if present at all) would be easily identifiable:
- File:NF Artifacts m1 tcpflags.zip: Measurement script for analyzing TCP FIN/RST flag flow record expiration behavior.
- File:NF Artifacts m2 bytecounters.zip: Measurement script for analyzing invalid byte counters in flow records.
- File:NF Artifacts m3 flowrecordexpiration.zip: Measurement script for analyzing flow record expiration behavior (based on active timeout, idle timeout and TCP flags).
All scripts rely on the Python-library Scapy.
Reference to the paper:
Rick Hofstede, Idilio Drago, Anna Sperotto, Ramin Sadre, Aiko Pras. In: Proceedings of the Passive and Active Measurement conference (PAM 2013), 18-20 May 2013, Hong Kong, China (to appear)
Abstract of the paper:
Flows provide an aggregated view of network traffic by grouping streams of packets. The resulting scalability gain usually excuses the coarser data granularity, as long as the flow data reflects the actual network traffic faithfully. However, it is known that the flow export process may introduce artifacts in the exported data. This paper extends the set of known artifacts by explaining which implementation decisions are causing them. In addition, we verify the artifacts' presence in data from a set of widely-used devices. Our results show that the revealed artifacts are widely spread among different devices from various vendors. We believe that these results provide researchers and operators with important insights for developing robust analysis applications.