JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 613
Type: Contributed
Date/Time: Thursday, August 7, 2014 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #311623 View Presentation
Title: Zipf's Law and Latent Dirichlet Allocation
Author(s): Thomas Jones*+
Companies: Institute for Defense Analyses
Keywords: Topic Model ; Dirichlet ; Multinomial ; Monte Carlo ; Bayesian ; Linguistics

Latent Dirichlet Allocation (LDA) is a popular hierarchical Bayesian model used in text mining. LDA models corpora as mixtures of categorical variables with Dirichlet priors. LDA is a useful model, but it is difficult to evaluate its effectiveness. The process that LDA models is not how people generate real language. Monte Carlo simulation is one approach to generating data where the "right" answers are known a priori. But sampling from the Dirichlet distributions that are often used as priors in LDA do not generate corpora with the property of natural language known as Zipf's law. We explore the relationship between the the Dirichlet distribution and Zipf's law within the framework of LDA. Considering Zipf's law allows researchers to more-easily explore the properties of LDA and make more-informed a priori decisions when modeling real textual data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program

2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.