Online Program Home
My Program

Abstract Details

Activity Number: 248
Type: Contributed
Date/Time: Monday, August 1, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321351
Title: A Topic Model for Hierarchical Documents
Author(s): Feifei Wang* and Yang Yang
Companies: and Peking University
Keywords: Topic model ; Short texts ; Hierarchical relationship ; Content analysis

Uncovering the topics over short text corpus has become increasingly important with the development of online communications. However, conventional topic mining methods may fail due to the lack of contexts in each short text. Fortunately, a large proportion of online short texts often co-occur with lengthy texts, such as comments with news articles. These two kinds of texts are hierarchically organized and the hidden topical relationships can be utilized to enhance topic learning for both sides. Therefore, we propose a topic model for (h)ierarchical (d)ocuments, referred as hdLDA, to capture the hierarchical structure of these texts. Specifically, in hdLDA each short text has a probability distribution over two topics, one from a set of topics underlying lengthy texts and the other from a topic set formed only by short texts. We also introduce an online algorithm for hdLDA for efficient topic learning. Extensive experiments on real-world datasets demonstrate that our approach discovers more comprehensive topics for both short texts and lengthy documents, compared with baseline and state-of-art methods.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association