Online Program Home
My Program

Abstract Details

Activity Number: 659 - Recent Advances in Dimension Reduction and Clustering
Type: Contributed
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #304606 Presentation 1 Presentation 2
Title: Gaussian Mixture Clustering Using Relative Tests of Fit
Author(s): Purvasha Chakravarti* and Larry Wasserman and Sivaraman Balakrishnan
Companies: Carnegie Mellon University and Carnegie Mellon University and Carnegie Mellon University
Keywords: Clustering; Hypothesis Testing; Gaussian Mixtures

We consider clustering based on significance tests for Gaussian Mixture Models (GMMs). Our starting point is the SigClust, a method developed by Liu, Hayes, Nobel and Marron (2008) which introduces a test based on the k-means objective (with k = 2) to decide whether the data should be split into two clusters. When applied recursively, this test yields a method for hierarchical clustering that is equipped with a significance guarantee. In this research, we study the power of this approach in some examples and show that there are large regions of the parameter space where the power is low. We then introduce a new test based on the idea of relative fit. In contrast to prior work, we do not assume that the distribution is either Gaussian or a mixture of Gaussians. Rather, we develop a test for whether a mixture of Gaussians provides a better fit relative to a single Gaussian, without assuming that either model is correct. The test we propose has a simple critical value and provides provable error control. We show how our tests can be used both hierarchically and sequentially, in a manner for model selection, for clustering.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program