Data integration with high dimensionality. Academic Article uri icon


  • We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source.

published proceedings

  • Biometrika

altmetric score

  • 3

author list (cited authors)

  • Gao, X., & Carroll, R. J.

citation count

  • 20

complete list of authors

  • Gao, Xin||Carroll, Raymond J

publication date

  • June 2017