Abstract:
|
We study linear subset regression in the context of the high-dimensional overall model $y = \theta'z + u$ with univariate response $y$ and a $d$-vector of random regressors $z$, independent of $u$, where the sample size $n$ may be much less than $d$. We consider simple linear submodels where $y$ is regressed on a set of $p$ regressors given by $x = B'z$, for some $d \times p$ matrix $B$ with $p \leq n$. The corresponding simple model $y = \gamma' x + v$, can be justified by imposing restrictions on the unknown parameter $\theta$; otherwise, this simple model can be grossly misspecified. In this talk, we show that the OLS predictor obtained by fitting the simple linear model is typically close to the Bayes predictor $E[y|x]$, uniformly in $\theta\in{\mathbb R}^d$, provided only that $d$ is large. Moreover, we establish the asymptotic validity of the standard $F$-test on the surrogate parameter $\gamma$ which realizes the best linear fit of $x$ on $y$. On a technical level, we rely on recent results from Steinberger and Leeb (arXiv, 2014) on conditional moments of high dimensional random vectors given low dimensional projections; see also Leeb (AoS, 2013) and Hall and Li (AoS, 1993).
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.