Abstract:
|
Cluster analysis can be lucidly defined as the process of sorting similar objects into groups. When a finite mixture model is utilized for cluster analysis, we call the process model-based clustering. This presentation will discuss the development of a mixture of contaminated shifted asymmetric Laplace factor analyzers (MCSALFA). This model will be well suited for the analysis of high-dimensional data; specifically, where the number of variables exceeds the number of observations. In addition to providing a classification of similar observations, the MCSALFA will also provide a classification of an observation as being either `good' or `bad', unifying the fields of model-based clustering and outlier detection. From a methodological standpoint, the MCSALFA will unify the factor analysis model and the contaminated mixture model and it will require the development of a robust parameter estimation scheme, which will be based on a variant of the expectation-maximization (EM) algorithm. The implementation of this algorithm in R will be discussed and the classification performance of the MCSALFAs will be demonstrated using a real data set.
|