Abstract:
|
In high-dimensional regression, sparsity is often a measure of the difficulty of variable selection problem. As a complement to sparsity, we introduces effect-size heterogeneity for a finer-grained understanding of the tradeoff between type I and type II errors. Roughly speaking, a vector has higher effect-size heterogeneity than another vector (of the same sparsity) if its nonzero entries are more distinct in magnitudes. We prove in a regime of linear sparsity, false and true positive rates achieve the optimal tradeoff uniformly along the Lasso path when this new measure is maximum: all effect-sizes have very different magnitudes; the worst-case tradeoff is achieved when it is minimum: all effect-sizes are equal. Moreover, we show when the effect-size heterogeneity is maximum, the Lasso path is optimal in terms of the rank of the first false variable. Metaphorically, these two findings suggest that variables with comparable effect-sizes would compete with each other along the Lasso path, leading to an increased hardness for variable selection. Our proofs use techniques from approximate message passing theory and a novel argument for estimating the rank of the first false variable.
|