Abstract:
|
Variable selection is a pervasive question in small-n-large-p problem. Incorporation of group structure to improve variable selection has been widely studied. In this paper, we consider incorporation of a multi-layer overlapping group structure to improve variable selection in regression setting. For example, a biological pathway contains tens to hundreds of genes and a gene can contain multiple experimentally measured features (such as its mRNA expression, copy number variation and possibly methylation level of multiple sites). In addition to the hierarchical structure, the groups may be overlapped (e.g. two pathways may contain overlapped genes). We propose a Bayesian hierarchical indicator model that can conveniently incorporate the multi-layer overlapping group structure in variable selection. We discuss properties of the proposed prior and prove selection consistency and asymptotic normality of the posterior median estimator of the method. We apply the model to two simulations and a TCGA breast cancer example to demonstrate its superiority over other existing methods. The results not only enhance prediction accuracy but also improve variable selection and model interpretation.
|