Fair sampling across network flow measurements
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Sampling is crucial for controlling resource consumption by internet traffic flow measurements. Routers use Packet Sampled NetFlow, and completed flow records are sampled in the measurement infrastructure. Recent research, motivated by the need of service providers to accurately measure both small and large traffic subpopulations, has focused on distributing a packet sampling budget amongst subpopulations. But long timescales of hardware development and lower bandwidth costs motivate post-measurement analysis of complete flow records at collectors instead. Sampling in collector databases then manages data volumes, yielding general purpose summaries that are rapidly queried to trigger drill-down analysis on a time limited window of full data. These are sufficiently small to be archived. This paper addresses the problem of distributing a sampling budget over subpopulations of flow records. Estimation accuracy goals are met by fairly sharing the budget. We establish a correspondence between the type of accuracy goal, and the flavor of fair sharing used. A streaming Max-Min Fair Sampling algorithm fairly shares the sampling budget across subpopulations, with sampling as a mechanism to deallocate budget. This provides timely samples and is robust against uncertainties in configuration and demand. We illustrate using flow records from an access router of a large ISP, where rates over interface traffic subpopulations vary over several orders of magnitude. We detail an implementation whose computational cost is no worse than subpopulation-oblivious sampling. 2012 ACM.
name of conference
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems