Online Program Home
  My Program

Abstract Details

Activity Number: 425 - SPEED: Reliable Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 3:05 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #325252
Title: Quantifying Uncertainty in Latent Dirichlet Allocation
Author(s): Christine Chai*
Companies: Duke University
Keywords: Latent Dirichlet allocation ; Uncertainty ; Topic modeling ; Text mining
Abstract:

In statistics, measuring uncertainty is equally important as getting the point estimate. For text datasets, latent Dirichlet allocation (LDA) is one of the most commonly used topic modeling algorithms. I discovered that keeping special phrases in text cleaning improves the topic distinctivity at the word level. In addition, I also used a synthetic dataset with known proportions to test how LDA performs under different settings. No matter what the number of topics is pre-set to, LDA tends to "spread out" the topic assignments, making it difficult to remove excessive topics.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association