Activity Number:
|
115
- Advances in Clustering and Classification
|
Type:
|
Contributed
|
Date/Time:
|
Monday, August 8, 2022 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #323023
|
|
Title:
|
Selective Inference for K-Means Clustering
|
Author(s):
|
Yiqun Chen* and Daniela Witten
|
Companies:
|
University of Washington, Seattle and University of Washington
|
Keywords:
|
Post-selection inference;
Unsupervised learning;
Hypothesis testing;
Selective inference;
k-means clustering;
Type I error
|
Abstract:
|
We consider the problem of testing for a difference in means between groups defined via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate, because the groups were estimated on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute finite-sample p-values that control the selective Type I error for clusters obtained using k-means clustering. We apply our proposal in simulation and on hand-written digits and single-cell RNA-sequencing data.
|
Authors who are presenting talks have a * after their name.