Abstract:
|
Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional (p > n) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied problem of performing inference on the Euclidean norm of the signal in high-dimensional linear regression. We derive a novel procedure for this, which is asymptotically correct when the covariates are multivariate Gaussian. The procedure, called EigenPrism, is computationally fast and makes no assumptions on coefficient sparsity or knowledge of the noise level. We show that EigenPrism confidence intervals are nearly minimax in width, and achieve coverage in practice and in finite samples much more widely than just the case of multivariate Gaussian covariates. We also apply EigenPrism to a genetic dataset with roughly 5000 subjects to estimate the heritability of a number of heart-disease-related traits.
|