Conference Program

Return to main conference page

All Times ET

Wednesday, June 8
Machine Learning
Computational Statistics
Practice and Applications
Modeling + Non-Parametric Methods, Part 2
Wed, Jun 8, 2:45 PM - 3:40 PM
Allegheny I
 

SMRT: A Structural Model of Latent Ratings and Topics in Text (310194)

*Desheng Ma, Cornell University 
Shawn Mankad, Cornell University 

Keywords: topic modeling, regression analysis, latent variable model, COVID-19, online reviews

Online reviews have long been an important data source to study customer behavior. Current research has mainly focused on utilizing topic modeling to understand customer preferences or recommend products, whereas little has been done to infer latent ratings on each hidden topic or characterize how each topic contributes to the overall review rating. To enhance the current understanding of customer behavior from online reviews, we propose a structural model of latent ratings and topics (SMRT), a data driven statistical approach that incorporates ratings, the review text, and review-specific covariates. Specifically, we construct a hierarchical mixed membership model to infer topics and latent topic ratings from reviews by parameterizing the topics, their prevalence, and their contribution to the overall review rating with a generalized linear model on an arbitrary number of document-level covariates (e.g., author demographic information, date of review). Our model also allows for rigorous statistical inference of how the observable covariates for each review determine topic prevalence and topic rating weights, which help to evolve topic modeling methods towards causal inference. We test the model on Yelp online reviews during 2020, with document-specific covariates capturing before, during, or post COVID-19 lockdowns. The findings provide evidence that the lockdowns and reopening are significantly associated with the discussion and sentiment around the topics of responsiveness and wait time: key service quality metrics that have managerial significance. The results highlight how SMRT can extract meaningful insights and help answer business questions from user-generated data on digital platforms.