Online Program Home
My Program

Abstract Details

Activity Number: 655
Type: Contributed
Date/Time: Thursday, August 4, 2016 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319737 View Presentation
Title: Model-Based Clustering with Measurement Errors
Author(s): Wanli Zhang* and Yanming Di
Companies: Oregon State University and Oregon State University
Keywords: Expectation-maximization ; Bayesian information criterion ; Finite mixture model ; Decision boundary ; Clustering

Model-based clustering with finite mixture models has become a widely used clustering method, implemented by the R package MCLUST. Usually, observations to be clustered are assumed to have been accurately measured, but there are situations where this assumption is not feasible. This article proposes a new model-based clustering algorithm, called MCLUST-ME, that properly accounts for measurement errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, for two-group clustering, the data are no longer linearly or quadratically separable in general. Instead, each unique value of measurement error covariance corresponds to its own decision boundary. Through simulation, we confirmed this point and also discover that on average, our method performs at least as well as MCLUST in terms of accuracy at the presence of measurement errors, and that the two methods do not always choose the same optimal model. A real data set from RNA-Seq analysis is used to further illustrate the difference in clustering results between two methods.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association