Abstract:
|
Predictive modeling often ignores interaction effects among predictors in high-dimensional data because of analytical challenges. Research on finding interactions has been galvanized along with methodological and computational advances. Statistical learning incorporates two types of interactions: regression-based multiplicative interactions and tree-based interactions. Current methods address either of them but not both since the former are discovered by hyperplanes and the latter are discovered by recursive bisections. Our aim is to investigate prediction performance of classification models designed to detect interactions. We compare several interaction selection methods such as regularization path algorithm under marginality principle (RAMP), random intersection trees, iterative random forest, and gradient boosting using two real datasets accounting for balanced and imbalanced response cases. Our empirical results show that RAMP with weak rule outperforms other methods in terms of accuracy, sensitivity, specificity, and F1 score for both response cases. We extend this comparative study to carefully designed simulated datasets and present the simulation results.
|