Bayesian Structural Equation Modeling in Multiple Omics Data Integration with Application to Circadian Genes
Institutional Repository Document
It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions which might be dormant in a single source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile such as copy number changes and RNA sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. The developed method is wrapped in R package semmcmc available at R CRAN.