Learn more, sample less: Control of volume and variance in network measurement
Additional Document Info
This paper deals with sampling objects from a large stream. Each object possesses a size, and the aim is to be able to estimate the total size of an arbitrary subset of objects whose composition is not known at the time of sampling. This problem is motivated from network measurements in which the objects are flow records exported by routers and the sizes are the number of packet or bytes reported in the record. Subsets of interest could be flows from a certain customer or flows from a worm attack. This paper introduces threshold sampling as a sampling scheme that optimally controls the expected volume of samples and the variance of estimators over any classification of flows. It provides algorithms for dynamic control of sample volumes and evaluates them on flow data gathered from a commercial Internet Protocol (IP) network. The algorithms are simple to implement and robust to variation in network conditions. The work reported here has been applied in the measurement infrastructure of the commercial IP network. To not have employed sampling would have entailed an order of magnitude greater capital expenditure to accommodate the measurement traffic and its processing. 2005 IEEE.