Abstract:
|
In current model-based clustering methods, statistical criteria guide the manual selection of relationships between clustering variables, but not the selection of variables important to clustering. Clustering variable selection procedures, such as Raftery & Dean (2006) and Maugis et al (2009), are limited to normally distributed data. Our new framework for model-based clustering on data with continuous and discrete variables extends the cluster variance structure framework set forth by Fraley and Raftery (1999). In modeling how each variable contributes to cluster determination, we allow for relations within and between the continuous and discrete variables (mixClust). We also modify and extend existing likelihood-based variable selection procedures to accommodate data with variables of mixed-distributional forms (ESR) and require at least one continuous variable. Simulation study results show desirable properties of our method for data with variables of mixed-distributional forms and improved performance over existing methods when applied to normally distributed data. Applying mixClust and ESR to prostate cancer data generates subgroups with different responses to treatment.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.