|
Activity Number:
|
107
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Monday, July 30, 2007 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
IMS
|
| Abstract - #309882 |
|
Title:
|
High-Dimensional Classification Using Features Annealed Independence Rule
|
|
Author(s):
|
Yingying Fan*+ and Jianqing Fan
|
|
Companies:
|
Princeton University and Princeton University
|
|
Address:
|
Department of ORFE, Princeton, NJ, 08544,
|
|
Keywords:
|
Classification ; feature extraction ; high dimensionality ; independence rule ; misclassification rates
|
|
Abstract:
|
High-dimensional classification arises frequently in contemporary statistical problems. The impact of dimensionality on classifications is poorly understood. We first demonstrate that even for the independence classification rule, classification using all features can be as bad as random guessing due to noise accumulation in estimating population means in high-dimensional setting. In fact, we prove that almost all linear discriminants can perform as bad as random guessing. Thus, it is important to select a subset of important features, resulting in Feature Annealed Independence Rules. The conditions under which all important features can be selected by the two-sample $t$-test are established. The choice of the optimal number of features is proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results.
|