JSM 2015 Preliminary Program

Online Program Home
My Program

Abstract Details

Activity Number: 433
Type: Contributed
Date/Time: Tuesday, August 11, 2015 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract #317438
Title: Summarizing Topics: From Word Lists to Phrases
Author(s): Lauren Hannah* and Hanna Wallach
Companies: Columbia University and Microsoft Research
Keywords: topic model ; Bayes factor ; interpretability
Abstract:

We propose a two-stage approach to generate descriptive phrases from the output of a multinomial topic model. First, we propose a Bayesian way to statistically select phrases from a document corpus, using priors associated with Latent Dirichlet Allocation (LDA). Second, the selected phrases are combined with the topic dictionary to make a list of candidate phrases, which are ranked in terms of topic descriptiveness using a metric based on the weighted Kullback Leibler divergence between the topic probabilities implied by the phrase and topic probabilities implied by the underlying model. The results are assessed for interpretability on a set of diverse corpora using Mechanical Turk.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program





For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home