Stream Aggregation Through Order Sampling - Texas A&M University (TAMU) Scholar

abstract

2017 Copyright held by the owner/author(s). This paper introduces a newsingle-pass reservoirweighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and efficient method forweighted sampling from a stream of uniquely keyed items, there is no current algorithm that realizes the bene.fits of order sampling in the context of stream aggregation over non-unique keys. A naive approach to order sample regardless of key then aggregate the results is hopelessly inefficient. In distinction, our proposed algorithm uses a single persistent random variable across the lifetime of each key in the cache, and maintains unbiased estimates of the key aggregates that can be queried at any point in the stream. The basic approach can be supplemented with a Sample and Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. .is approach represents a considerable reduction in computational complexity compared with the state of the art in adapting Sample and Hold to operate with a .xed cache size. Concerning statistical properties, we prove that PBA provides unbiased estimates of the true aggregates. We analyze the computational complexity of PBA and its variants, and provide a detailed evaluation of its accuracy on synthetic and trace data. Weighted relative error is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive Sample and Hold; there is also substantial improvement for rank queries.

name of conference

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

authors

Duffield, Nick

published proceedings

CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT

author list (cited authors)

Duffield, N., Xu, Y., Xia, L., Ahmed, N. K., & Yu, M.

citation count

3

complete list of authors

Duffield, Nick||Xu, Yunhong||Xia, Liangzhen||Ahmed, Nesreen K||Yu, Minlan

publication date

November 2017

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Aggregation
Heavy Hitters
Priority Sampling
Subset Sums

Digital Object Identifier (DOI)

10.1145/3132847.3133042

International Standard Book Number (ISBN) 13

9781450349185

start page

909

end page

918

volume

Part F131841

URL

http://dx.doi.org/10.1145/3132847.3133042

Stream Aggregation Through Order Sampling Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

volume

Other

URL