Online Program Home
My Program

Abstract Details

Activity Number: 427 - SPEED: Bayesian Methods, Part 2
Type: Contributed
Date/Time: Tuesday, July 30, 2019 : 3:05 PM to 3:50 PM
Sponsor: Section on Bayesian Statistical Science
Abstract #307875
Title: A Distributed MCMC Sampler for Latent Dirichlet Allocation
Author(s): Kelson Zawack* and Hongyu Zhao
Companies: Yale University and Yale
Keywords: MCMC; Distributed Computing; Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA) is a hierarchical Bayesian model that infers topics from a collection of documents by assuming that documents are a distribution over topics and topics are a distribution over words. LDA has been applied in areas as diverse as text mining, image processing, and genomics. Since all of these applications potentially involve large quantities of data there is need for inference approaches that scale to multiple processors in order to increase the amount of data that can be analyzed and decrease processing time. Most work on distributed implementations of LDA has made use of large amounts of communication to synchronize state between processors. This, however, introduces high performance costs. We show how this cost can be avoided by reframing LDA as a two-stage model. In the first stage shard specific posterior distributions are inferred by each processor in isolation. In the second stage, the full posterior is inferred from the shards with an efficient Metropolis-Hastings scheme that uses the shard distributions as its proposal distribution. We benchmark the algorithm in simulation and in an image processing application.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program