Distributed query sampling Conference Paper uri icon

abstract

  • We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches. Copyright 2006 ACM.

name of conference

  • Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

published proceedings

  • Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

author list (cited authors)

  • Caverlee, J., Liu, L., & Bae, J.

citation count

  • 11

complete list of authors

  • Caverlee, James||Liu, Ling||Bae, Joonsoo

editor list (cited editors)

  • Efthimiadis, E. N., Dumais, S. T., Hawking, D., & Järvelin, K.

publication date

  • January 2006