Abstract:
|
Sifting through all possible interactions in modeling applications is a dangerous statistical endeavor. All too often, at least one of the interactions is found to improve the fit “significantly”, and we become obligated to interpret an opaque model with clinically dubious interactions. With this in mind, we illustrate the concept of ranked sparsity: a phenomenon that occurs naturally in the presence of interactions. In particular, when an expected disparity exists in the quality of information between different feature sets, most model selection methods will fail due to their implicit presumption that each predictor might be equally informative. In practice, this presumption combined with the sheer number of interactions grossly inflates the number of falsely discovered interactions, resulting in unnecessarily complicated models. Our paradigm motivates a higher degree of prior skepticism for interactions and can be implemented with the sparsity-ranked lasso (SRL). We explore the performance of SRL in a series of simulations, showing that the SRL is fast, accurate, and produces more transparent models (with fewer false interactions) than other state-of-the-art methods.
|