Leveraging Covariate and Structural Information for Efficient Large-Scale and High-Dimensional Inference
Grant
Overview
Affiliation
Other
View All
Overview
abstract
The proliferation of big data is accompanied by a vast number of questions, in the form of hypothesis tests, which call for effective methods to conduct large-scale and high-dimensional inferences. These influential methods must involve statistical analysis on many study units simultaneously. Conventional simultaneous inference procedures often assume that hypotheses for different units are exchangeable. However, in many scientific applications, external covariate and structural information regarding the patterns of signals are available. Exploiting such side information efficiently and accurately will lead to improved statistical power, as well as enhanced interpretability of research results. The main thrust of this research is to advance statistical methodologies and theories for large-scale and high-dimensional inference with a particular focus on integrating potentially useful external covariate and structural information into inferential procedures.This research aims to develop innovative methodologies and theories to address several significant problems in large-scale and high-dimensional inference. In Project 1, the PI will introduce a new multiple testing procedure that can automatically select relevant covariates to improve the efficiency in inference when a large number of external covariates are available. In Project 2, the PI will develop a new multiple testing framework, which can integrate various forms of structural information. Because prior information is seldom perfectly accurate, a particular focus will be on developing procedures that are robust to misspecified/imperfect prior information. In Project 3, the PI shall propose new procedures for simultaneous inference in high-dimensional regressions with side information. The statistical tools will be used to identify skilled fund managers, assess the performance of climate field reconstructions, and analyze genomic data in an integrative way. Methods and computer code developed will be made publicly available.This award reflects NSF''s statutory mission and has been deemed worthy of support through evaluation using the Foundation''s intellectual merit and broader impacts review criteria.