JSM 2015 Online Program

Online Program Home
My Program

Abstract Details

Activity Number: 190
Type: Contributed
Date/Time: Monday, August 10, 2015 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract #317496
Title: Important Features PCA for High-Dimensional Clustering
Author(s): Wanjie Wang* and Jiashun Jin
Companies: The Wharton School and Carnegie Mellon University
Keywords: Big Data ; Feature selection ; Spectral Clustering ; Gene microarray ; Sparsity

We consider a clustering problem where we observe N feature vectors from K possible classes, each feature vector with length P. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime that P >> N, where classical clustering methods face challenges. We propose Important Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select a small fraction of features with the largest Kolmogorov-Smirnov (KS) scores, obtain the first (K ?1) left singular vectors of the post-selection normalized data matrix, and then estimate the labels by applying the classical k-means to these singular vectors. The threshold is set in a data-driven fashion by adapting the recent notion of Higher Criticism. As a result, IF-PCA is a tuning free clustering method. IF-PCA is applied to 10 gene microarray data sets. The method has competitive performance in clustering.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program

For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home