Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 288 - SLDS CSpeed 5
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318650
Title: Clustering and Directional Outlier Detection with Missing Information
Author(s): Hung Tong* and Cristina Tortora
Companies: San Jose State University and San Jose State University
Keywords: cluster analysis; mixture model; missing data; outlier detection; EM algorithm
Abstract:

Cluster analysis is a technique that aims to produce smaller groups of similar observations in a data set. In model-based clustering, the population is assumed to be a convex combination of sub-populations, each of which is modeled by a probability distribution. When data sets are characterized by outliers, a contaminated normal (CN) distribution can be used to model sub-population. The CN is a two-component Normal mixture: one with a large prior probability represents good observations, and the other with a small prior probability, the same mean, and an inflated covariance matrix represents outliers. The CN distribution can produce robust parameter estimates and detect mild outliers automatically. An extension of the CN, the multiple scaled contaminated normal (MSCN) distribution, has the advantage of directional robust parameter estimates and outlier detection; that is, these procedures work separately for each principal component. However, this model cannot be fitted to incomplete data sets. Hence, we develop a framework for fitting a mixture of MSCN distributions to data sets that contain some values missing at random using the expectation-conditional maximization algorithm.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program