Abstract Details
Activity Number:
|
613
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 7, 2014 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract #311623
|
View Presentation
|
Title:
|
Zipf's Law and Latent Dirichlet Allocation
|
Author(s):
|
Thomas Jones*+
|
Companies:
|
Institute for Defense Analyses
|
Keywords:
|
Topic Model ;
Dirichlet ;
Multinomial ;
Monte Carlo ;
Bayesian ;
Linguistics
|
Abstract:
|
Latent Dirichlet Allocation (LDA) is a popular hierarchical Bayesian model used in text mining. LDA models corpora as mixtures of categorical variables with Dirichlet priors. LDA is a useful model, but it is difficult to evaluate its effectiveness. The process that LDA models is not how people generate real language. Monte Carlo simulation is one approach to generating data where the "right" answers are known a priori. But sampling from the Dirichlet distributions that are often used as priors in LDA do not generate corpora with the property of natural language known as Zipf's law. We explore the relationship between the the Dirichlet distribution and Zipf's law within the framework of LDA. Considering Zipf's law allows researchers to more-easily explore the properties of LDA and make more-informed a priori decisions when modeling real textual data.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2014 program
|
2014 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Professional Development program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.