Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 444 - Recent Advances in Statistical Methodology for Big Data
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 4:00 PM to 5:50 PM
Sponsor: IMS
Abstract #318956
Title: Fast and Scalable SPCA via Statistical Modeling Reductions
Author(s): Kayhan Behdin* and Rahul Mazumder
Companies: MIT and Massachusetts Institute of Technology
Keywords: Sparse PCA; Mixed Integer Programming; Spiked Covariance Model; Mixed Integer Quadratic Programming
Abstract:

We consider the Sparse Principal Component Analysis (SPCA) problem under the spiked covariance model. The SPCA problem can be reformulated as a Mixed Integer Problem (MIP) with optimal statistical properties. However, current MIP algorithms for SPCA cannot scale beyond datasets with 1000s of variables. In this paper, we propose a new MIP formulation of SPCA via statistical modeling and reductions. By utilizing properties of the spiked covariance model, we provide Mixed Integer Quadratic Program (MIQP) and Mixed Integer Second Order Cone Program (MISOCP) formulations for SPCA. We provide statistical guarantees for our MIQP formulation in terms of estimation error. In addition, we provide tailored cutting plane algorithms based on outer approximation. Our numerical experiments on synthetic and real datasets show that our algorithms scale to large scale problems with up to 10000 variables in tens of minutes.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program