Online Program Home
My Program

Abstract Details

Activity Number: 671 - Network Analysis, Text Mining and Bayesian Functional Clustering: Data Visualization and Other Considerations
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Graphics
Abstract #329731
Title: User-Guided Topic Modeling Through Interactive Visualization
Author(s): Nathan Wycoff* and Scotland Leman and Ian Crandell and Peter Hauck and Michelle Dowling
Companies: Virginia Tech and Virginia Tech and Virginia Tech and Virginia Tech and Virginia Tech
Keywords: Topic Modeling; Human-Computer Interaction; Latent Dirichlet Allocation; Visualization; Natural Language Processing; Text Analytics
Abstract:

Together with the explosive growth and availability of unlabelled and messy text data has come a demand for techniques to aide in its understanding. A popular family of models in this regard has been Topic Models, which postulate that the observed documents can be explained by a relatively small number of topics, or probability distributions on words. Topic Models are intended to give an easily digestible summary of a corpus. However, these topics are commonly dominated by high frequency words with low semantic meaning. Term-weighting was introduced for this reason, to give certain words more power in terms of topic formation than others, with influence determined by inverse document frequency.

Oftentimes, the end user of a topic model is an expert not in statistics, but in an applied field from which the data of interest are collected. We seek to allow the user to guide topic formation through the use of term weights by inferring these based on user interaction with a 2D visualization, leading to an iterative refinement of the topic model. The final topic model is a synergy of the user's expertise and the structure present in the text data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program