Gunanathan, Sudharsan (2008-08). SUPPORTING DOMAIN SPECIFIC WEB-BASED SEARCH USING HEURISTIC KNOWLEDGE EXTRACTION. Master's Thesis. Thesis uri icon

abstract

  • Modern search engines like Google support domain-independent search over the vast information contained in web documents. However domain-specific information access, such as finding less well-known people, locations, and events are not performed efficiently without users developing sophisticated query strategies. This thesis describes the design and development of an application to support one such domain-specific information activity: for insurance (and related) companies to identify weather and natural disaster damage to better assess when and where personnel will be needed. The approach presented to supporting such activity combines information extraction with an interactive presentation of results. Previous domain specific search engines extract information about papers, people, and course information using rule-based or learningbased techniques. However they use the results of information extraction in a typical query and list of results interface. They fail to address the need for interaction based on the extracted document features. The domain specific web-based search application developed in this project combines information extraction with the interactive display of results to facilitate rapid information location. A heuristic evaluation was performed to determine whether the application met the design goals and to improve the design. Thus the final application has an unconventional but interactive presentation of the results with the use of tree based display. The application also allows options for user specific results caching and modification of the search and caching process. With a heuristic based search process it extracts information about place, date and damages regarding a specific disaster using a bank of search heuristics developed.
  • Modern search engines like Google support domain-independent search over the

    vast information contained in web documents. However domain-specific information

    access, such as finding less well-known people, locations, and events are not performed

    efficiently without users developing sophisticated query strategies. This thesis describes

    the design and development of an application to support one such domain-specific

    information activity: for insurance (and related) companies to identify weather and

    natural disaster damage to better assess when and where personnel will be needed. The

    approach presented to supporting such activity combines information extraction with an

    interactive presentation of results. Previous domain specific search engines extract

    information about papers, people, and course information using rule-based or learningbased

    techniques. However they use the results of information extraction in a typical

    query and list of results interface. They fail to address the need for interaction based on

    the extracted document features. The domain specific web-based search application

    developed in this project combines information extraction with the interactive display of results to facilitate rapid information location. A heuristic evaluation was performed to

    determine whether the application met the design goals and to improve the design.

    Thus the final application has an unconventional but interactive presentation of

    the results with the use of tree based display. The application also allows options for user

    specific results caching and modification of the search and caching process. With a

    heuristic based search process it extracts information about place, date and damages

    regarding a specific disaster using a bank of search heuristics developed.

publication date

  • August 2008