n324489SE Academic Article uri icon

abstract

  • Dynamic Web data sources on the Deep Web provide intuitive access to real-time information and large data repositories anywhere that Web access is available. Although recent studies suggest that the dynamic Web is larger and growing faster than static Web, dynamic content is often ignored by existing search engine indexers owing to technical challenges inherent in searching dynamic sources. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources. Dyna- Bot has three unique characteristics. First, DynaBot utilizes a service class model implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular architecture for focused crawling of the Deep Web. Third, DynaBot incorporates algorithms for efficiently probing, discovering, and clustering Deep Web sources through SCD-based service analysis. Experimental results demonstrate DynaBots effectiveness and suggest techniques for efficiently managing service discovery given the immense scale of the Deep Web.

published proceedings

  • International Journal of Web Services Research

author list (cited authors)

  • Rocco, D., Caverlee, J., Liu, L., & Critchlow, T.

publication date

  • January 1, 2007 11:11 AM