On the impact of model selection on predictor identification and parameter inference.

abstract

We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20% for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95% confidence interval coverage of predictors with null effects was approximately 100% for Algorithm 1 for all methods, and 95% for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95% for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes.

authors

Carroll, Raymond

published proceedings

Comput Stat

altmetric score

1

author list (cited authors)

Pfeiffer, R. M., Redd, A., & Carroll, R. J.

citation count

8

complete list of authors

Pfeiffer, Ruth M||Redd, Andrew||Carroll, Raymond J

publication date

June 2017

publisher

Springer Nature Publisher

keywords

Biased Estimates
Finite Sample Inference
Post-model Selection Inference
Shrinkage
Variable Selection

Digital Object Identifier (DOI)

10.1007/s00180-016-0690-2

start page

667

end page

690

volume

32

issue

2

URL

http%3A%2F%2Fdx.doi.org%2F10.1007%2Fs00180-016-0690-2

On the impact of model selection on predictor identification and parameter inference.

Overview

abstract

authors

published proceedings

altmetric score

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL