Missing-value estimation using linear and non-linear regression with Bayesian gene selection. Academic Article uri icon


  • MOTIVATION: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. RESULTS: We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. AVAILABILITY: The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

published proceedings

  • Bioinformatics

author list (cited authors)

  • Zhou, X., Wang, X., & Dougherty, E. R.

citation count

  • 56

complete list of authors

  • Zhou, Xiaobo||Wang, Xiaodong||Dougherty, Edward R

publication date

  • November 2003