Abstract:
|
In many contexts, there is interest in selecting the most important variables from a very large collection, commonly referred to as support recovery or variable, feature or subset selection. There is an enormous literature proposing a rich variety of algorithms. In scientific applications, it is of crucial importance to quantify uncertainty in variable selection, providing measures of statistical significance for each variable. The overwhelming majority of algorithms fail to produce such measures. This has led to a focus in the scientific literature on independent screening methods, which examine each variable in isolation, obtaining p-values measuring the significance of marginal associations. Bayesian methods provide an alternative, with marginal inclusion probabilities used in place of p-values. Bayesian variable selection has advantages but is impractical computationally beyond small problems. We show that approximate message passing can be used to rapidly obtain accurate approximations to marginal inclusion probabilities in high-dimensional variable selection. Theoretical support is derived from information theory.
|