On unbiased sampling for unstructured peer-to-peer networks
Conference Paper
Overview
Identity
Additional Document Info
Other
View All
Overview
abstract
This paper addresses the difficult problem of selecting representative samples of peer properties (eg degree, link bandwidth, number of files shared) in unstructured peer-to-peer systems. Due to the large size and dynamic nature of these systems, measuring the quantities of interest on every peer is often prohibitively expensive, while sampling provides a natural means for estimating system-wide behavior efficiently. However, commonly-used sampling techniques for measuring peer-to-peer systems tend to introduce considerable bias for two reasons. First, the dynamic nature of peers can bias results towards short-lived peers, much as naively sampling flows in a router can lead to bias towards short-lived flows. Second, the heterogeneous nature of the overlay topology can lead to bias towards high-degree peers.We present a detailed examination of the ways that the behavior of peer-to-peer systems can introduce bias and suggest the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples. We conduct an extensive simulation study to demonstrate that the proposed technique works well for a wide variety of common peer-to-peer network conditions. Using the Gnutella network, we empirically show that our implementation of the MRWB technique yields more accurate samples than relying on commonly-used sampling techniques. Furthermore, we provide insights into the causes of the observed differences. The tool we have developed, ion-sampler, selects peer addresses uniformly at random using the MRWB technique. These addresses may then be used as input to another measurement tool to collect data on a particular property. Copyright 2006 ACM.
name of conference
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement