Inference of Missing ICD 9 Codes Using Text Mining and Nearest Neighbor Techniques Conference Paper uri icon


  • Missing data is a common characteristic of many databases. In electronic medical records, missing data in fields like ICD 9 (International Classification of Diseases) [1] impact the effective analysis of medical results, medical procedures, environmental conditions, and demographics. The accurate labeling of diseases in medical records is critical to all types of epidemiological analyses that leverage health system data. Methods that address this issue in health management systems would significantly enhance the data's potential in knowledge discovery applications. This paper describes the algorithms we developed to handle missing ICD 9 codes in medical datasets. Our approach involved developing a prediction model for the ICD 9 codes based on other associated attributes like medical diagnosis, medical remarks, and patient statements. Text mining was performed on this unstructured data to extract key concepts in these fields, and nearest neighborhood based techniques were used to predict the missing ICD 9 codes [2, 3]. 2012 IEEE.

name of conference

  • 2012 45th Hawaii International Conference on System Sciences

published proceedings

  • 2012 45th Hawaii International Conference on System Sciences

author list (cited authors)

  • Erraguntla, M., Gopal, B., Ramachandran, S., & Mayer, R.

citation count

  • 16

complete list of authors

  • Erraguntla, Madhav||Gopal, Belita||Ramachandran, Satheesh||Mayer, Richard

publication date

  • January 2012