Online Program Home
My Program

Abstract Details

Activity Number: 304 - Clustering and Regression Analyzes
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: International Statistical Institute
Abstract #328769 Presentation
Title: Model-Based Cluster Analysis and Outlier Detection
Author(s): Cristina Tortora* and Antonio Punzo
Companies: San Jose State University and University of Catania
Keywords: Mixture models; contaminated normal distributions ; EM-algorithm; Multiple scaled distributions; cluster analysis; outlier detection
Abstract:

Finite mixture models assume that a population is a convex combination of densities; therefore, they are well suited for clustering applications. The choice of the density function has been highly discussed in the recent literature. The p-variate contaminated normal distribution (CND) was proposed to model datasets characterized by the presence of outliers. The CND is a two-component Gaussian mixture; one of the components, with a large prior probability, represents the good observations, and the other, with a small prior probability, the same mean, and an inflated covariance matrix, represents the outliers. Mixtures of CNDs can detect outliers and perform cluster analysis improving the clustering performance when compared to normal mixtures and representing an alternative to t mixtures. However, the CND uses univariate parameters to model the proportion of outliers and their impact on the inflation parameter, i.e., they are the same for all the variables. This is a limit because the outliers may be different in each dimension. To overcome this issue, we propose a multiple scaled contaminated normal distribution with p-dimensional proportion of outliers and degree of contamination.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program