Online Program

Thursday, February 18
PS1 Poster Session 1 & Opening Mixer sponsored by SAS Thu, Feb 18, 5:30 PM - 7:00 PM
Ballroom Foyer

A Method for Selecting the Relevant Dimensions for Text Classification in Singular Vector Spaces (303194)

*Dawit Gezahegn Tadesse, University of Cincinnati 

Keywords: Feature Selection, Singular Value Decomposition (SVD), Text Mining, Naive Bayes

In this poster, we give a new feature selection algorithm for the text mining problem in sparse high-dimensional spaces. Singular Value Decomposition (SVD) is a popular dimension reduction method in higher-dimensional text classification. The traditional SVD method begins by ranking the Singular Dimensions (SDs) from largest singular value to the smallest. However, when the signal is sparse and the signal-to-noise ratio low, the first few ranked SDs are not necessarily the best for classification. We demonstrate, theoretically and empirically, that our method efficiently selects the SDs most appropriate for classification and significantly reduces the misclassification error. We also apply our method to a real data text mining application.