Deriving Image-Text Document Surrogates to Optimize Cognition

abstract

The representation of information collections needs to be optimized for human cognition. Growing information collections play a crucial role in human experiences. While documents often include rich visual components, collections, including personal collections and those generated by search engines, are typically represented by lists of text-only surrogates. By concurrently invoking complementary components of human cognition, combined image-text surrogates will help people to more effectively see, understand, think about, and remember information collection. This research develops algorithmic methods that use the structural context of images in HTML documents to associate meaningful text and thus derive combined image-text surrogates. Our algorithm first recognizes which documents consist essentially of informative and multimedia content. Then, the algorithm recognizes the informative sub-trees within each such document, discards advertisements and navigation, and extracts images with contextual descriptions. Experimental results demonstrate the algorithm's efficacy. An implementation of the algorithm is provided in combinFormation, a creativity support tool for collection authoring. The enhanced image-text surrogates enhance the experiences of users engaged in information discovery tasks, which involve finding and collecting information as part of developing new ideas. Copyright 2009 ACM.

name of conference

Proceedings of the 9th ACM symposium on Document engineering

authors

Kerne, Andruid

published proceedings

DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING

author list (cited authors)

Koh, E., & Kerne, A.

citation count

0

complete list of authors

Koh, Eunyee||Kerne, Andruid

editor list (cited editors)

Borghoff, U. M., & Chidlovskii, B.

publication date

January 2009

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Information Extraction
Search Representation
Surrogates

Digital Object Identifier (DOI)

10.1145/1600193.1600212

International Standard Book Number (ISBN) 13

978-1-60558-575-8

start page

84

end page

93

URL

http://www.informatik.uni-trier.de/~ley/db/conf/doceng/doceng2009.html

Deriving Image-Text Document Surrogates to Optimize Cognition Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

editor list (cited editors)

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

Other

URL