Friday, October 20

Fri, Oct 20, 2:30 PM - 3:45 PM
Aventine Ballroom E

Speed Session 4

Efficient Bounds for Best Subset Selection Using Mixed Integer Linear Programming (303878)

Francesca Chiaromonte, Pennsylvania State University
Giovanni Felici, IIASI CNR
*Ana Maria Kenney, Pennsylvania State University

Keywords: best subset selection, mixed integer linear programming, lasso, least absolute deviation, cross-validation

Regularization methods such as Lasso regression are widely used due to their ability to perform automated variable selection (by inducing sparsity) and parameter estimation simultaneously. This is especially desirable in high-dimensional scenarios where standard subset selection methods are not computationally feasible. Building upon recent literature, we use a mixed integer linear program (MILP) formulation that directly controls sparsity by bounding the number of features. In particular, we focus on the selection of an appropriate bound – which was not explicitly explored in previous studies. Performing k-fold cross-validation can be computationally expensive even with recent improvements in MILP solvers. We exploit the flexible nature of MILP to implement an inner cross-validation scheme and a simple stopping rule that allow us to choose the bound efficiently. Results across various simulation scenarios show that our approach is computationally viable and can compete with, and also outperform, Lasso regression.

Online Program

Efficient Bounds for Best Subset Selection Using Mixed Integer Linear Programming (303878)

American Statistical Association

Share