Optimal Bayesian Classification With Missing Values Academic Article uri icon


  • © 1991-2012 IEEE. Missing values can be an impediment to designing and applying classifiers. Missing values are common in biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements in modern biomedicine. Many procedures have been proposed to impute values that are missing. This paper considers missing feature values in the context of optimal Bayesian classification, which selects a classifier that minimizes the expected error with respect to the posterior distribution governing an uncertainty class of feature-label distributions. The missing-value problem fits neatly into the overall framework of optimal Bayesian classification by marginalizing out the missing-value process from the feature-label distribution, and then updating the prior distribution of class-conditional parameters to posterior distributions using new observations. Generally, an optimal Bayesian classifier is defined via the effective class-conditional densities, which are averages of the parameterized feature-label distributions in the uncertainty class relative to the posterior distribution. Hence, once the posterior distribution incorporating the missing value process is found, the optimal Bayesian classifier pertaining to the features with missing values can be derived from the corresponding effective class-conditional densities. This paper presents the general theory, derives a closed-form decision rule for the optimal Bayesian classifier in a Gaussian model with independent features, and utilizes Hamiltonian Monte Carlo for the Gaussian model with arbitrary covariance matrices. Superior performance is demonstrated when compared to linear discriminant analysis, quadratic discriminant analysis, and support vector machines in conjunction with Gibbs sampling imputation using synthetic and real-world omics data.

author list (cited authors)

  • Dadaneh, S. Z., Dougherty, E. R., & Qian, X.

citation count

  • 7

publication date

  • June 2018