Abstract:
|
We propose a new approach for estimating the risk in the classical normal means problem. We construct an exactly unbiased estimator of risk for any estimator without access to test data, and without relying on any kind of resampling or sample-splitting. A crucial aspect of our approach is that it is unbiased for a slightly harder version of the problem, where the noise level in the data is assumed to be elevated. The key idea is to generate from the training data two auxiliary data sets by adding carefully constructed synthetic noise; the estimator is then trained on the first of these data sets and tested on the second. Under some conditions, this approach exactly recovers (as a special case, in the limit as the amount of added noise vanishes) classical methodology in the statistics literature such as Mallow’s Cp estimator, and more generally, Stein’s unbiased risk estimator. Through a bias-variance decomposition of our risk estimator, we quantify the order of the bias and variance as a function of the magnitude of added noise. Finally, we show that simply averaging the estimated risk over multiple replications of adding noise is an effective way of controlling the variance.
|