Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 33 - Junior Research in Methods for Integrating Heterogeneous Data: From Clustering to Factor Analysis
Type: Topic Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 11:50 AM
Sponsor: International Society for Bayesian Analysis (ISBA)
Abstract #313503
Title: Bayesian Factor Analysis for High-Dimensional Clustering
Author(s): Noirrit Chandra* and David Dunson and Antonio Canale
Companies: Duke University and Duke University and Universita degli Studi di Padova
Keywords: Clusutering; big data; Mixture model; Dirichlet process; factor model; Bayesian asymptotics
Abstract:

It is often of interest to cluster subjects based on very high-dimensional data. Although Bayesian discrete mixture models are often successful at model-based clustering, we demonstrate pitfalls in high-dimensional settings. The first key problem is a tendency for posterior sampling algorithms based on Markov chain Monte Carlo to produce a very large number of clusters that slowly decreases as sampling proceeds, indicating serious mixing problems. The second key problem is that the true posterior also has aberrant behavior but potentially in the opposite direction. In particular, we show that, for diverging dimension and fixed sample size, the true posterior either assigns each observation to a different cluster or all observations to the same cluster, depending on the kernels and prior specification. We propose a general strategy for solving these problems by basing clustering on a discrete mixture model for a low-dimensional latent variable. We refer to this class of methods as LAtent Mixtures for Bayesian clustering. Theoretical support is provided, and we illustrate substantial gains relative to clustering on the observed data level in simulation studies.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program