Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 288 - SLDS CSpeed 5
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318783
Title: How Many Clusters Are Best? Investigating Model Selection in Robust Clustering
Author(s): Louis Tran* and Cristina Tortora
Companies: San Jose State University and San Jose State University
Keywords: Model-based clustering; outliers; number of cluster; student t distribution; contaminated normal distribution; multiple scaled
Abstract:

In model-based clustering, different density functions are used to model sub-populations in the data. When data are characterized by outliers, robust distributions such as the Student-t (T) or the contaminated normal (CN) distribution, and their extensions for directional tail behavior, multiple scaled (MS) T and CN, can be used. Model-based clustering methods take the number of clusters as an input parameter, and many indices exist to choose the number of clusters. In this paper, we use simulated and real data sets to compare different indices to select the number of clusters when using mixtures of T, CN, MST, and MSCN distributions. The effectiveness of each index is determined by the number of successes in selecting the right number of sub-populations in the data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program