A Parameterized Approach to Spam-Resilient Link Analysis of the Web Academic Article uri icon

abstract

  • Link-based analysis of the Web provides the basis for many important applications - like Web search, Web-based data mining, and Web page categorization - that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank. © 2009 IEEE.

author list (cited authors)

  • Caverlee, J., Webb, S., Liu, L., & Rouse, W. B.

citation count

  • 9

publication date

  • October 2008