Abstract Details
Activity Number:
|
433
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 11, 2015 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
Abstract #317438
|
|
Title:
|
Summarizing Topics: From Word Lists to Phrases
|
Author(s):
|
Lauren Hannah* and Hanna Wallach
|
Companies:
|
Columbia University and Microsoft Research
|
Keywords:
|
topic model ;
Bayes factor ;
interpretability
|
Abstract:
|
We propose a two-stage approach to generate descriptive phrases from the output of a multinomial topic model. First, we propose a Bayesian way to statistically select phrases from a document corpus, using priors associated with Latent Dirichlet Allocation (LDA). Second, the selected phrases are combined with the topic dictionary to make a list of candidate phrases, which are ranked in terms of topic descriptiveness using a metric based on the weighted Kullback Leibler divergence between the topic probabilities implied by the phrase and topic probabilities implied by the underlying model. The results are assessed for interpretability on a set of diverse corpora using Mechanical Turk.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2015 program
|
For program information, contact the JSM Registration Department or phone (888) 231-3473.
For Professional Development information, contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
2015 JSM Online Program Home
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.