Supervised logistic principal component analysis for pathway based genome-wide association studies Conference Paper uri icon

abstract

  • Genome-Wide Association Studies (GWAS) in the past have focused on single-marker associations with traits of interest, which may fail to detect combination effects of Single-Nucleotide Polymorphisms (SNPs) or genes with weak individual effects, especially for complex diseases. Pathway-based approaches become popular recently to address this problem by evaluating combined effects of genetic variants belonging to predefined pathways with respect to certain biological functionalities. In this paper, we propose a Supervised Logistic Principal Component Analysis (SLPCA) method to explicitly model categorical SNP data and identify supervised principal components with maximum combined effects of SNPs in pathways with respect to disease outcome. The first principal component scores obtained by LPCA on subsets of SNPs in pathways are defined as aggregated variables for pathway association analysis. We have used simulated genotype data by HAP-SAMPLE to compare our proposed method with another pathway-based method which performs traditional supervised PCA on categorical SNP data. The results demonstrate the superiority of our SLPCA due to the explicit modeling of categorical SNP data, which leads to higher power for predicting disease associated pathways in case-control studies. Our preliminary analysis of the genotype data of Crohn's Disease (CD) from Wellcome Trust Case Control Consortium (WTCCC) has identified relevant pathways and has shown the potential of our SLPCA for association analysis in complex diseases. Copyright © 2012 ACM.

author list (cited authors)

  • Lu, M., Huang, J. Z., & Qian, X.

citation count

  • 5

publication date

  • January 2012