Latent Dirichlet Allocation (LDA) is a hierarchical Bayesian model that infers topics from a collection of documents by assuming that documents are a distribution over topics and topics are a distribution over words. LDA has been applied in areas as diverse as text mining, image processing, and genomics. Since all of these applications potentially involve large quantities of data there is need for inference approaches that scale to multiple processors in order to increase the amount of data that can be analyzed and decrease processing time. Most work on distributed implementations of LDA has made use of large amounts of communication to synchronize state between processors. This, however, introduces high performance costs. We show how this cost can be avoided by reframing LDA as a two-stage model. In the first stage shard specific posterior distributions are inferred by each processor in isolation. In the second stage, the full posterior is inferred from the shards with an efficient Metropolis-Hastings scheme that uses the shard distributions as its proposal distribution. We benchmark the algorithm in simulation and in an image processing application.