Abstract:
|
We consider predicting an outcome Y using a large number of covariates X. However, most of the data we have to fit the model contains only Y and W, which is a noisy surrogate for X, and only on a small number of observations do we observe Y, X, and W. We develop Ridge-type shrinkage methods that trade-off between bias and variance in a data-adaptive way to yield smaller prediction error using information from both datasets. We also demonstrate how the problem can be treated in a full Bayesian context with different forms of adaptive shrinkage. Finally, we introduce the notion of a hyper-penalty for guiding choices of the tuning parameter to perform adaptive shrinkage.The high-dimensionality of the problem, the large fraction of missing covariate information, and the fact that we are interested in a prediction model for Y|X (rather than Y|W) make this a non-standard statistical problem. The general idea of integrating/leveraging information from existing diverse data sources to boost prediction has broader application in contemporary scientific studies.
|