Content-based crowd retrieval on the real-time web - Texas A&M University (TAMU) Scholar

abstract

In this paper, we propose and evaluate a novel content-driven crowd discovery algorithm that can efficiently identify newly-formed communities of users from the real-time web. Short-lived crowds reflect the real-time interests of their constituents and provide a foundation for user-focused web monitoring. Three of the salient features of the algorithm are its: (i) prefix-tree based locality-sensitive hashing approach for discovering crowds from high-volume rapidly-evolving social media; (ii) efficient user profile updating for incorporating new user activities and fading older ones; and (iii) key dimension identification, so that crowd detection can be focused on the most active portions of the real-time web. Through extensive experimental study, we find significantly more efficient crowd discovery as compared to both a k-means clustering-based approach and a MapReduce-based implementation, while maintaining high-quality crowds as compared to an offline approach. Additionally, we find that expert crowds tend to be "stickier" and last longer in comparison to crowds of typical users. 2012 ACM.

name of conference

Proceedings of the 21st ACM international conference on Information and knowledge management

authors

Caverlee, James

published proceedings

Proceedings of the 21st ACM international conference on Information and knowledge management

author list (cited authors)

Kamath, K. Y., & Caverlee, J.

citation count

7

complete list of authors

Kamath, Krishna Y||Caverlee, James

editor list (cited editors)

Chen, X., Lebanon, G., Wang, H., & Zaki, M. J.

publication date

January 2012

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Networking And Information Technology R&d

Digital Object Identifier (DOI)

10.1145/2396761.2396789

International Standard Book Number (ISBN) 13

9781450311564

start page

195

end page

204

URL

http://dl.acm.org/citation.cfm?id=2396761

Content-based crowd retrieval on the real-time web Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

editor list (cited editors)

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

Other

URL