Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 347 - Recent Advances in Clustering and Mixture Models Analysis
Type: Topic-Contributed
Date/Time: Thursday, August 12, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #317401
Title: Efficient Clustering for Stretched Mixtures: Landscape and Optimality
Author(s): Kaizheng Wang* and Yuling Yan and Mateo Diaz
Companies: Columbia University and Princeton University and Cornell University
Keywords: clustering; dimension reduction; unsupervised learning; landscape; non-convex optimization
Abstract:

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program