Abstract:
|
This paper considers errors-in-variables models in a high-dimensional setting. When no measurement error is present in the covariates, the lasso is often used for estimation in high-dimensional models; however, the presence of measurement error can result in severely biased parameter estimates and also affects the ability of the lasso to recover the true sparsity pattern. A new estimator, called SIMulation-SELection-EXtrapolation (SIMSELEX) is proposed. Central to the new estimator is the application of simulation-extrapolation procedure (SIMEX) to the lasso in combination with a variable selection step after the simulation step but before the extrapolation step. The SIMSELEX estimator is shown to perform well in variable selection and has a significantly lower estimation error than the naive estimators that ignores the measurement errors. Furthermore, SIMSELEX can be applied in every errors-in-variables setting; this paper illustrates this paper illustrates applications in linear regression, logistic regression, Cox survival model, and nonparametric model. The method is used to analyze a dataset that contains gene expression measurements of favorable histology Wilms tumors.
|