All Times EDT
Virtual
Simultaneous Outlier Detection and Feature Selection Using Mixed-Integer Programming (309607)
Francesca Chiaromonte, Pennsylvania State University and Sant’Anna School of Advanced StudiesGiovanni Felici, IIASI CNR
Luca Insolia, Scuola Normale Superiore
*Ana Maria Kenney, Pennsylvania State University
Keywords: Sparse estimation, robust estimation, mixed integer programming, feature selection
Contemporary sciences are increasingly data rich, but redundant features in a model can lead to unstable estimates, inference, and prediction. Feature selection methods attempt to avoid this by favoring sparse, interpretable models. However, for many approaches, performance deteriorates in the presence of contaminated units (i.e. outliers) which often go unnoticed in practice. We investigate high-dimensional regression models contaminated by multiple mean-shift outliers affecting both the response and the design matrix. In contrast to existing approaches, which heavily rely on heuristics, we propose a discrete and optimal method to perform simultaneous feature selection and outlier detection using Mixed-Integer Programming. We prove several theoretical properties under this framework and demonstrate its superior performance against competing methods in an extensive simulation study and real data application.