Abstract:
|
Consider a linear regression model y = Xß + z. The Gram matrix (1/n)X'X is non-sparse but is approximately the sum of two components --a low-rank matrix and a sparse matrix, where neither component is known to us. We are interested in the Rare/Weak signal setting where all but a small fraction of the entries of ß are nonzero, and the nonzero entries are relatively small individually. The goal is to rank the variables in a way so as to maximize the area under the ROC curve. We propose Factor-adjusted Covariate Assisted Ranking (FA-CAR) as a two-step approach to variable ranking. In the FA-step, we use PCA to reduce the linear model to a new one where the Gram matrix is sparse. In the CAR-step, we rank variables by exploiting the local covariate structures. FA-CAR is easy to use and computationally fast, and it is effective in resolving "signal cancellation", a challenge we face in regression models. We compare the ROC curve of FA-CAR with some other ranking ideas on numerical experiments. Using a Rare/Weak signal model, we derive the convergence rate of the minimum sure-screening model size of FA-CAR.
|