Sketching unaggregated data streams for subpopulation-size queries Conference Paper uri icon

abstract

  • IP packet streams consist of multiple interleaving IP flows. Statistical summaries of these streams, collected for different measurement periods, are used for characterization of traffic, billing, anomaly detection, inferring traffic demands, configuring packet filters and routing protocols, and more. While queries are posed over the set of flows, the summarization algorithmis applied to the stream of packets. Aggregation of traffic into flows before summarization requires storage of per-flow counters, which is often infeasible. Therefore, the summary has to be produced over the unaggregated stream. An important aggregate performed over a summary is to approximate the size of a subpopulation of flows that is specified a posteriori. For example, flows belonging to an application such as Web or DNS or flows that originate from a certain Autonomous System. We design efficient streaming algorithms that summarize unaggregated streams and provide corresponding unbiased estimators for subpopulation sizes. Our summaries outperform, in terms of estimates accuracy, those produced by packet sampling deployed by Cisco's sampled NetFlow, the most widely deployed such system. Performance of our best method, step sample-and-hold is close to that of summaries that can be obtainedfrom pre-aggregated traffic. Copyright 2007 ACM.

name of conference

  • Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

published proceedings

  • Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

author list (cited authors)

  • Cohen, E., Duffield, N., Kaplan, H., Lund, C., & Thorup, M.

citation count

  • 17

complete list of authors

  • Cohen, Edith||Duffield, Nick||Kaplan, Haim||Lund, Carsten||Thorup, Mikkel

publication date

  • January 2007