Inference of Missing ICD 9 Codes Using Text Mining and Nearest Neighbor Techniques
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
Missing data is a common characteristic of many databases. In electronic medical records, missing data in fields like ICD 9 (International Classification of Diseases) [1] impact the effective analysis of medical results, medical procedures, environmental conditions, and demographics. The accurate labeling of diseases in medical records is critical to all types of epidemiological analyses that leverage health system data. Methods that address this issue in health management systems would significantly enhance the data's potential in knowledge discovery applications. This paper describes the algorithms we developed to handle missing ICD 9 codes in medical datasets. Our approach involved developing a prediction model for the ICD 9 codes based on other associated attributes like medical diagnosis, medical remarks, and patient statements. Text mining was performed on this unstructured data to extract key concepts in these fields, and nearest neighborhood based techniques were used to predict the missing ICD 9 codes [2, 3]. 2012 IEEE.
name of conference
2012 45th Hawaii International Conference on System Sciences