Binarization of microarray data on the basis of a mixture model.

Although gathered as continuous data, expression measurements from gene microarrays may be quantized before downstream analysis and modeling. This is especially true for modeling gene prediction and genetic regulatory networks. Coarse quantization results in lower computational requirements, lower data requirements for model inference, and easier conceptualization. This paper proposes a mixture model for binarization. For each gene, the model, composed of a sum of two distributions, is fit to expression data for that gene, and data points are binarized according to the model. The mixture model is based on the assumption of multiplicative up-regulation. The proposed method is compared with mean and median binarization by comparing classification performance based on the binary data from the different methods. Classification is performed for simulated data generated from a microarray model studied previously and for cancer data arising from two studies involving hereditary breast cancer and small, round blue-cell tumors of childhood.

Binarization of microarray data on the basis of a mixture model. Academic Article