Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis Conference Paper uri icon


  • The characteristic of low minor allele frequency (MAF) and weak individual effects makes genome-wide association studies (GWAS) for rare variant single nucleotide polymorphisms (SNPs) more difficult by conventional statistical methods. Collapsing is the most common way to enhance the detection of rare variant effects by analyzing the association with a given trait by aggregating the rare variant effects belonging to the same functional gene. In this paper, we propose a novel MAF-based logistic principal component analysis (MLPCA) to derive aggregated statistics by explicitly modeling the correlations between rare variant SNP data, which is categorical. The derived aggregated statistics by MLPCA can then be tested as a surrogate variable in regression models to detect the gene-environment interaction from rare variants. In addition, MLPCA searches for the optimal linear combination from the best subset of rare variants according to MAF that has the maximum association with trait. To compare the power of our method with four existing collapsing methods for gene-environment interaction association analysis, we have applied these different methods on the Genetic Analysis Workshop 17 (GAW17) simulation data. Our preliminary experimental results have demonstrated that MLPCA has more power than those existing methods and can be further improved by introducing the appropriate sparsity penalty. © 2012 IEEE.

author list (cited authors)

  • Lu, M., Lee, H., Hadley, D., Huang, J. Z., & Qian, X.

citation count

  • 1

publication date

  • December 2012