Keywords: interaction selection, non-Gaussian clustered data, penalized regression, sparsity, stagewise estimation
Stagewise estimation is a slow-brewing approach for model building that has recently experienced a revival due to its computational efficiency, its flexibility, and its intrinsic connections with penalized estimation. Built upon generalized estimating equations, we propose general stagewise estimation approaches to select models with interaction terms in non-Gaussian/non-linear models for clustered data. The key task is to perform variable selection that maintains the interaction hierarchy. We develop two techniques to address this challenge. The first is a hierarchical lasso stagewise estimating equations approach, which is shown to directly correspond to the hierarchical lasso penalized regression. The second is a stagewise active set approach, which enforces the variable hierarchy by conforming the selection to a properly growing active set in each stagewise estimation step. Simulation studies are presented to show the efficacy of the proposed approaches. We apply the proposed approaches to study the association between the suicide-related hospitalization rates among 15--19 year olds in Connecticut and the characteristics of the school districts in which they reside.