Test Collection Management and Labeling System - Texas A&M University (TAMU) Scholar

abstract

In order to evaluate the performance of information retrieval and extraction algorithms, we need test collections. A test collection consists of a set of documents, a clearly formed problem that an algorithm is supposed to provide solutions to, and the answers that the algorithm should produce when executed on the documents. Defining the association between elements in the test collection and answers is known as labeling. For mainstream information retrieval problems, there are publicly available test collections which have been maintained for years. However, the scope of these problems, and thus the associated test collections, is limited. In other cases, researchers need to build, label, and manage their own test collections, which can be a tedious and error-prone task. We built test collections of HTML documents, for problems in which the answer that the algorithm supplies is a sub-tree of the DOM (Document Object Model). To lighten the burden of this task, we developed a test collection management and labeling system (TCMLS), to facilitate usability in the process of building test collections, applying them to validate algorithms, and potentially sharing them across the research community. Copyright 2009 ACM.

name of conference

Proceedings of the 9th ACM symposium on Document engineering

authors

Kerne, Andruid

published proceedings

DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING

author list (cited authors)

Koh, E., Kerne, A., & Berry, S.

citation count

2

complete list of authors

Koh, Eunyee||Kerne, Andruid||Berry, Sarah

editor list (cited editors)

Borghoff, U. M., & Chidlovskii, B.

publication date

January 2009

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Document Object Model
Test Collection
Xml Schema

Digital Object Identifier (DOI)

10.1145/1600193.1600203

International Standard Book Number (ISBN) 13

978-1-60558-575-8

start page

39

end page

42

URL

http://www.informatik.uni-trier.de/~ley/db/conf/doceng/doceng2009.html

Test Collection Management and Labeling System Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

editor list (cited editors)

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

Other

URL