Distributed query sampling Conference Paper uri icon


  • We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches. Copyright 2006 ACM.

author list (cited authors)

  • Caverlee, J., Liu, L., & Bae, J.

citation count

  • 10

editor list (cited editors)

  • Efthimiadis, E. N., Dumais, S. T., Hawking, D., & J√§rvelin, K.

publication date

  • January 2006