Service class driven dynamic data source discovery with DynaBot

abstract

Dynamic Web data sources on the Deep Web provide intuitive access to real-time information and large data repositories anywhere that Web access is available. Although recent studies suggest that the dynamic Web is larger and growing faster than static Web, dynamic content is often ignored by existing search engine indexers owing to technical challenges inherent in searching dynamic sources. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources. Dyna- Bot has three unique characteristics. First, DynaBot utilizes a service class model implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular architecture for focused crawling of the Deep Web. Third, DynaBot incorporates algorithms for efficiently probing, discovering, and clustering Deep Web sources through SCD-based service analysis. Experimental results demonstrate DynaBots effectiveness and suggest techniques for efficiently managing service discovery given the immense scale of the Deep Web.

authors

Caverlee, James

published proceedings

INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH

author list (cited authors)

Rocco, D., Caverlee, J., Liu, L., & Critchlow, T.

citation count

0

complete list of authors

Rocco, Daniel||Caverlee, James||Liu, Ling||Critchlow, Terence

publication date

July 2007

publisher

IGI Global Publisher

published in

International Journal of Web Services Research Journal

keywords

Deep Web
Dynamic Web Data
Service Discovery
Web Crawling

Digital Object Identifier (DOI)

10.4018/jwsr.2007070102

start page

26

end page

48

volume

4

issue

3

URL

http://dx.doi.org/10.4018/jwsr.2007070102

Service class driven dynamic data source discovery with DynaBot

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL