Abstract:
|
Consider a high dimensional linear regression model Y = X?+?z with sparse signal vector. The goal is to identify ?'s nonzero coordinates (i.e., variable selection). We are primarily interested in the regime where signals are both rare and weak so that successful variable selection is challenging but is still possible. We assume the Gram matrix G = X?X is sparse in the sense that each row has relatively few large entries. The sparsity of G naturally induces the sparsity of the so-called Graph of Strong Dependence (GOSD). The key insight is that there is an interesting interplay between the signal sparsity and graph sparsity: in a broad context, the signals decompose into many small-size components of GOSD that are disconnected to each other. We propose Graphlet Screening, a two-step Screen and Clean procedure for variable selection. The main methodological innovation is to use the graph structure of GOSD in both the screening and cleaning processes.
|