Deep graph representations embed network information for robust disease marker identification. Academic Article uri icon

abstract

  • MOTIVATION: Accurate disease diagnosis and prognosis based on omics data rely on the effective identification of robust prognostic and diagnostic markers that reflect the states of the biological processes underlying the disease pathogenesis and progression. In this article, we present GCNCC, a Graph Convolutional Network-based approach for Clustering and Classification, that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state. RESULTS: By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility. AVAILABILITY AND IMPLEMENTATION: https://github.com/omarmaddouri/GCNCC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

published proceedings

  • Bioinformatics

author list (cited authors)

  • Maddouri, O., Qian, X., & Yoon, B.

complete list of authors

  • Maddouri, Omar||Qian, Xiaoning||Yoon, Byung-Jun

editor list (cited editors)

  • Xu, J.

publication date

  • January 2022