JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 45
Type: Contributed
Date/Time: Sunday, August 4, 2013 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #309905
Title: Optimal Feature Selection by Higher Criticism in High-Dimensional Spectral Clustering
Author(s): Wanjie Wang*+ and Jiashun Jin
Companies: and Carnegie Mellon University
Keywords: Feature selection ; Higher Criticism Threshold ; Ideal thresholding ; phase diagram ; spectral clustering ; random matrix theory
Abstract:

Consider a two-class clustering problem in the high dimensional context where we have n samples from two possible classes, but the class labels are unknown to us and it is of interest to estimate them.

We propose the following approach to spectral clustering:

1. We use Kolmogorov-Smirnov statistic to assess the importance of the features.

2. Based on the p-values, we perform a feature selection, where the threshold is determined by the idea of Higher Criticism Thresholding.

3. Based on all retained features, we obtain the leading eigenvector of the so-called Dual empirical covariance matrix, and predict the class labels by the signs of the coordinates of this eigenvector.

The rationale behind the approach a surprising connection between the so-called Signal Noise Ratio (SNR) associated with the leading eigenvector and the recent Higher Criticism statistic. The approach is tested on two gene microarray data sets.

We develop an asymptotic framework where the signals are assumed to be both rare and weak. We show that HCT is consistent to the ideal threshold choice (the threshold one would choose if the underlying parameters are known to us).


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.