Khunlertgit, Navadon (2016-12). Improvement of Reproducibility in Cancer Classification Based on Pathway Markers and Subnetwork Markers. Doctoral Dissertation. Thesis uri icon

abstract

  • Identification of robust biomarkers for cancer prognosis based on gene expression data is an important research problem in translational genomics. The high-dimensional and small-sample-size data setting makes the prediction of biomarkers very challenging. Biomarkers have been identified based solely on gene expression data in the early stage. However, very few of them are jointly shared among independent studies. To overcome this irreproducibility, the integrative approach has been proposed to identify better biomarkers by overlaying gene expression data with available biological knowledge and investigating genes at the modular level. These module-based markers jointly analyze the gene expression activities of closely associated genes; for example, those that belong to a common biological pathway or genes whose protein products form a subnetwork module in a protein-protein interaction network. Several studies have shown that modular biomarkers lead to more accurate and reproducible prognostic predictions than single-gene markers and also provide the better understanding of the disease mechanisms. We propose novel methods for identifying modular markers which can be used to predict breast cancer prognosis. First, to improve identification of pathway markers, we propose using probabilistic pathway activity inference and relative expression analysis. Then, we propose a new method to identify subnetwork markers based on a message-passing clustering algorithm, and we further improve this method by incorporating topological attribute using association coefficients. Through extensive evaluations using multiple publicly available datasets, we demonstrate that all of the proposed methods can identify modular markers that are more reliable and reproducible across independent datasets compared to those identified by existing methods, hence they have the potential to become more effective prognostic cancer classifiers.
  • Identification of robust biomarkers for cancer prognosis based on gene expression data is an important research problem in translational genomics. The high-dimensional and small-sample-size data setting makes the prediction of biomarkers very challenging. Biomarkers have been identified based solely on gene expression data in the early stage. However, very few of them are jointly shared among independent studies. To overcome this irreproducibility, the integrative approach has been proposed to identify better biomarkers by overlaying gene expression data with available biological knowledge and investigating genes at the modular level. These module-based markers jointly analyze the gene expression activities of closely associated genes; for example, those that belong to a common biological pathway or genes whose protein products form a subnetwork module in a protein-protein interaction network. Several studies have shown that modular biomarkers lead to more accurate and reproducible prognostic predictions than single-gene markers and also provide the better understanding of the disease mechanisms.

    We propose novel methods for identifying modular markers which can be used to predict breast cancer prognosis. First, to improve identification of pathway markers, we propose using probabilistic pathway activity inference and relative expression analysis. Then, we propose a new method to identify subnetwork markers based on a message-passing clustering algorithm, and we further improve this method by incorporating topological attribute using association coefficients. Through extensive evaluations using multiple publicly available datasets, we demonstrate that all of the proposed methods can identify modular markers that are more reliable and reproducible across independent datasets compared to those identified by existing methods, hence they have the potential to become more effective prognostic cancer
    classifiers.

publication date

  • December 2016