Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 115 - Advances in Clustering and Classification
Type: Contributed
Date/Time: Monday, August 8, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323023
Title: Selective Inference for K-Means Clustering
Author(s): Yiqun Chen* and Daniela Witten
Companies: University of Washington, Seattle and University of Washington
Keywords: Post-selection inference; Unsupervised learning; Hypothesis testing; Selective inference; k-means clustering; Type I error

We consider the problem of testing for a difference in means between groups defined via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate, because the groups were estimated on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute finite-sample p-values that control the selective Type I error for clusters obtained using k-means clustering. We apply our proposal in simulation and on hand-written digits and single-cell RNA-sequencing data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program