Abstract:
|
We consider the Sparse Principal Component Analysis (SPCA) problem under the spiked covariance model. The SPCA problem can be reformulated as a Mixed Integer Problem (MIP) with optimal statistical properties. However, current MIP algorithms for SPCA cannot scale beyond datasets with 1000s of variables. In this paper, we propose a new MIP formulation of SPCA via statistical modeling and reductions. By utilizing properties of the spiked covariance model, we provide Mixed Integer Quadratic Program (MIQP) and Mixed Integer Second Order Cone Program (MISOCP) formulations for SPCA. We provide statistical guarantees for our MIQP formulation in terms of estimation error. In addition, we provide tailored cutting plane algorithms based on outer approximation. Our numerical experiments on synthetic and real datasets show that our algorithms scale to large scale problems with up to 10000 variables in tens of minutes.
|