Abstract:
|
Motivated by the needs of selecting important features from big neuroimaging data, we develop a new Bayesian feature screening approach in the generalized linear model (GLM) framework. We assign the conjugate priors on the coefficients and obtain the analytical form of the marginal posterior density function. Under some mild regularity conditions, we show that the marginal posterior moments follow a mixture of normal distributions, one of which component is the standard normal distribution for unimportant variables. In light of this theoretical foundation, we develop a Bayesian variable screening algorithm for ultra-high dimensional data con- sisting of two steps: Step 1: compute a multivariate variable screening statistic based on marginal posterior moments; Step 2: perform the mixture model-based cluster anal- ysis on screening statistics to identify the unimportant variables. Step 1 only requires a computational complexity on the linear order of the number of predictors and it is straightforward to be parallelized. It has a close connection with sure independent screening (SIS) statistics and high-dimensional ordinary least-squares projection (HOLP) methods. Step 2 is an extensi
|