Sampling for Big Data: A Tutorial - Texas A&M University (TAMU) Scholar

abstract

One response to the proliferation of large datasets has been to develop ingenious ways to throw resources at the problem, using massive fault tolerant storage architectures, parallel and graphical computation models such as MapReduce, Pregel and Giraph. However, not all environments can support this scale of resources, and not all queries need an exact response. This motivates the use of sampling to generate summary datasets that support rapid queries, and prolong the useful life of the data in storage. To be effective, sampling must mediate the tensions between resource constraints, data characteristics, and the required query accuracy. The state-of-the-art in sampling goes far beyond simple uniform selection of elements, to maximize the usefulness of the resulting sample. This tutorial reviews progress in sample design for large datasets, including streaming and graph-structured data. Applications are discussed to sampling network traffic and social networks. 2014 Authors.

name of conference

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

authors

Duffield, Nick

published proceedings

PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14)

author list (cited authors)

Cormode, G., & Duffield, N.

citation count

36

complete list of authors

Cormode, Graham||Duffield, Nick

publication date

August 2014

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Generic Health Relevance

Digital Object Identifier (DOI)

10.1145/2623330.2630811

International Standard Book Number (ISBN) 13

9781450329569

start page

1975

end page

1975

URL

http://dx.doi.org/10.1145/2623330.2630811

Sampling for Big Data: A Tutorial Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

Other

URL