JSM 2017 Online Program

Activity Number:	335 - SPEED: Reliable Statistical Learning and Data Science
Type:	Contributed
Date/Time:	Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #323940	View Presentation
Title:	Quantifying Uncertainty in Latent Dirichlet Allocation
Author(s):	Christine Chai*
Companies:	Duke University
Keywords:	Latent Dirichlet allocation ; Uncertainty ; Topic modeling ; Text mining
Abstract:	In statistics, measuring uncertainty is equally important as getting the point estimate. For text datasets, latent Dirichlet allocation (LDA) is one of the most commonly used topic modeling algorithms. I discovered that keeping special phrases in text cleaning improves the topic distinctivity at the word level. In addition, I also used a synthetic dataset with known proportions to test how LDA performs under different settings. No matter what the number of topics is pre-set to, LDA tends to "spread out" the topic assignments, making it difficult to remove excessive topics.

Authors who are presenting talks have a * after their name.