All Times ET
Keywords: model selection, feature selection, lasso, explainable machine learning
We developed a set of novel machine learning algorithms with the goal of producing transparent models (i.e. understandable-by-humans) while also flexibly accounting for nonlinearity and interactions. Our methods use a novel concept of ranked sparsity, the aim of which is to allow for flexibility and user-control in varying the shade of the opacity of black-box machine learning methods [1]. In this work, we put our new ranked sparsity algorithms (as implemented in our new open-source R package, sparseR) to the test in a predictive model bakeoff on a diverse set of simulated and real-world data sets from the Penn Machine Learning Benchmarks database, including both regression and classification problems. Specifically, we evaluate the extent to which our new human-centered algorithms can attain predictive accuracy that rivals popular black-box approaches such as neural networks, random forests, and SVMs, while also producing more interpretable and stable inferences regarding important predictors. Using out-of-bag error as a meta-outcome, we describe the properties of data sets in which human-centered approaches can perform as well as or better than black-box approaches. While black-box models undeniably prevail in some scenarios, these settings are neither ubiquitous nor perfectly defined. In fact, interpretable approaches predicted optimally or within 5% of the optimal method in a majority of real-world data sets. We provide a strong rationale for including human-centered transparent algorithms such as ours as a rule in predictive modeling applications. Finally, we qualitatively compare resulting inferences obtained from our front-end transparent algorithm to “back-end” (post-ML) explainability-based approaches.
[1] Peterson, R.A., Cavanaugh, J.E. Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials. AStA Advances in Statistical Analysis (2022).