Concept Discovery for Pathology Reports using an N-gram Model. Academic Article uri icon

abstract

  • A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. One of the leading systems in the cancer community is the Cancer Text Information Extraction System (caTIES), which was developed with caBIG-compliant data structures. caTIES embedded two key components for extracting data: MMTx and GATE. In this paper, an n-gram based framework is proven to be capable of discovering concepts from text reports. MetaMap is used to map medical terms to the National Cancer Institute (NCI) Metathesaurus and the Unified Medical Language System (UMLS) Metathesaurus for verifying legitimate medical data. The final concepts from our framework and caTIES are weighted based on our scoring model. The scores show that, on average, our framework scores higher than caTIES on 848 (36.9%) of reports. Furthermore, 1388 (60.5%) of reports have similar performances on both systems.

published proceedings

  • Summit Transl Bioinform

author list (cited authors)

  • Yip, V., Mete, M., Topaloglu, U., & Kockara, S.

citation count

  • 6

complete list of authors

  • Yip, Vincent||Mete, Mutlu||Topaloglu, Umit||Kockara, Sinan

publication date

  • March 2010