Content-based crowd retrieval on the real-time web Conference Paper uri icon

abstract

  • In this paper, we propose and evaluate a novel content-driven crowd discovery algorithm that can efficiently identify newly-formed communities of users from the real-time web. Short-lived crowds reflect the real-time interests of their constituents and provide a foundation for user-focused web monitoring. Three of the salient features of the algorithm are its: (i) prefix-tree based locality-sensitive hashing approach for discovering crowds from high-volume rapidly-evolving social media; (ii) efficient user profile updating for incorporating new user activities and fading older ones; and (iii) key dimension identification, so that crowd detection can be focused on the most active portions of the real-time web. Through extensive experimental study, we find significantly more efficient crowd discovery as compared to both a k-means clustering-based approach and a MapReduce-based implementation, while maintaining high-quality crowds as compared to an offline approach. Additionally, we find that expert crowds tend to be "stickier" and last longer in comparison to crowds of typical users. 2012 ACM.

name of conference

  • Proceedings of the 21st ACM international conference on Information and knowledge management

published proceedings

  • Proceedings of the 21st ACM international conference on Information and knowledge management

author list (cited authors)

  • Kamath, K. Y., & Caverlee, J.

citation count

  • 7

complete list of authors

  • Kamath, Krishna Y||Caverlee, James

editor list (cited editors)

  • Chen, X., Lebanon, G., Wang, H., & Zaki, M. J.

publication date

  • January 2012