|
Activity Number:
|
436
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 5, 2009 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
| Abstract - #304747 |
|
Title:
|
Variable Selection for Clustering
|
|
Author(s):
|
Hyang Min Lee*+ and Jia Li
|
|
Companies:
|
Penn State University and Penn State University
|
|
Address:
|
425 Waupelani Dr., State College, PA, 16801,
|
|
Keywords:
|
Variable selection ; Modal EM ; Ridgeline EM ; Mixture modeling ; Mclust ; Separability
|
|
Abstract:
|
A new variable selection algorithm is developed to achieve good separation between clusters. We exploit the prominent geometric features of the density function so that the exact shape and orientation of the density matter. The computational foundation for the separability measure includes the newly developed Modal EM algorithm and the Ridgeline EM algorithm. We propose a way to combine the pairwise separability between clusters into an aggregated distinctiveness (AD). Forward selection is applied to maximize AD. The multivariate density estimation is obtained by Mclust. Components from mixture modeling in Mclust are examined for potential merging of multiple components into a single uni-mode cluster. The variable selection procedure enables us to find lower dimensional subspaces retaining the major clustering structure, useful for both visualization and discovery of important variables.
|