Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis
- Additional Document Info
- View All
The characteristics of low minor allele frequency (MAF) and weak individual effects make genome-wide association studies (GWAS) for rare variant single nucleotide polymorphisms (SNPs) more difficult when using conventional statistical methods. By aggregating the rare variant effects belonging to the same gene, collapsing is the most common way to enhance the detection of rare variant effects for association analyses with a given trait. In this paper, we propose a novel framework of MAF-based logistic principal component analysis (MLPCA) to derive aggregated statistics by explicitly modeling the correlation between rare variant SNP data, which is categorical. The derived aggregated statistics by MLPCA can then be tested as a surrogate variable in regression models to detect the gene-environment interaction from rare variants. In addition, MLPCA searches for the optimal linear combination from the best subset of rare variants according to MAF that has the maximum association with the given trait. We compared the power of our MLPCA-based methods with four existing collapsing methods in gene-environment interaction association analysis using both our simulation data set and Genetic Analysis Workshop 17 (GAW17) data. Our experimental results have demonstrated that MLPCA on two forms of genotype data representations achieves higher statistical power than those existing methods and can be further improved by introducing the appropriate sparsity penalty. The performance improvement by our MLPCA-based methods result from the derived aggregated statistics by explicitly modeling categorical SNP data and searching for the maximum associated subset of SNPs for collapsing, which helps better capture the combined effect from individual rare variants and the interaction with environmental factors.
author list (cited authors)
Lu, M., Lee, H., Hadley, D., Huang, J. Z., & Qian, X.