Online Program Home
  My Program

Abstract Details

Activity Number: 425 - SPEED: Reliable Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 3:05 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #325252
Title: Quantifying Uncertainty in Latent Dirichlet Allocation
Author(s): Christine Chai*
Companies: Duke University
Keywords: Latent Dirichlet allocation ; Uncertainty ; Topic modeling ; Text mining

In statistics, measuring uncertainty is equally important as getting the point estimate. For text datasets, latent Dirichlet allocation (LDA) is one of the most commonly used topic modeling algorithms. I discovered that keeping special phrases in text cleaning improves the topic distinctivity at the word level. In addition, I also used a synthetic dataset with known proportions to test how LDA performs under different settings. No matter what the number of topics is pre-set to, LDA tends to "spread out" the topic assignments, making it difficult to remove excessive topics.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association