Online Program

Return to main conference page

All Times EDT

Friday, October 2
Fri, Oct 2, 1:40 PM - 2:55 PM
Virtual
Concurrent Session

Simultaneous Outlier Detection and Feature Selection Using Mixed-Integer Programming (309607)

Francesca Chiaromonte, Pennsylvania State University and Sant’Anna School of Advanced Studies 
Giovanni Felici, IIASI CNR  
Luca Insolia, Scuola Normale Superiore 
*Ana Maria Kenney, Pennsylvania State University 

Keywords: Sparse estimation, robust estimation, mixed integer programming, feature selection

Contemporary sciences are increasingly data rich, but redundant features in a model can lead to unstable estimates, inference, and prediction. Feature selection methods attempt to avoid this by favoring sparse, interpretable models. However, for many approaches, performance deteriorates in the presence of contaminated units (i.e. outliers) which often go unnoticed in practice. We investigate high-dimensional regression models contaminated by multiple mean-shift outliers affecting both the response and the design matrix. In contrast to existing approaches, which heavily rely on heuristics, we propose a discrete and optimal method to perform simultaneous feature selection and outlier detection using Mixed-Integer Programming. We prove several theoretical properties under this framework and demonstrate its superior performance against competing methods in an extensive simulation study and real data application.