Online Program

Return to main conference page
Saturday, May 19
Applications
Business Analytics
Sat, May 19, 10:30 AM - 12:00 PM
Lake Fairfax B
 

Forecasting Accuracy of Topic Modeling Techniques with Online Reviews: A Benchmark Study (304621)

*Yuan Cheng, Cornell University 
Shawn Mankad, Cornell University 

Keywords: topic modeling, online reviews, prediction, benchmark

Reviews from online markets are critical for companies to obtain feedback and develop strategies. Given the high volume of unstructured online reviews, topic modeling methods have been established as powerful tools that extract meaningful topics and reduce human labor. The most popular topic modeling method is the Latent Dirichlet allocation model, since it exhibits favorable properties from an information retrieval perspective. Matrix factorization based methods such as nonnegative matrix factorization are also gaining popularity for their computation efficiency. For prediction with topic models, one of the most common approaches is to follow a two-stage procedure, where one first derives text features through topic modeling and subsequently estimates linear models for prediction and inference. Yet, there is lack of a guidance in choosing different topic modeling methods for forecasting in various text scenarios. In this paper, we conduct analysis on simulated and real dataset to generate insights of the performance of different methods. We argue that when the research design combines estimated topics as independent variables within linear models, as is common in applied economic and decision analyses, the best topic modeling method should balance capturing the underlying textual themes in addition to maximizing statistical and forecasting power of the regression model. In this paper we perform the first ever benchmark study of topic modeling methods from the perspective of prediction to provide guidance for practitioners in selecting topic modeling methods depending on properties of the textual corpus.