Keywords: text mining, topic modeling, survey, supervised latent Dirichlet allocation
Open-ended questions are becoming more common in surveys, due to the diverse responses they can capture. However, the analysis of survey text is often conducted manually, which can be expensive and prone to subjectivity. Therefore, we would like to automatically analyze text and numerical data using the supervised latent Dirichlet Allocation (sLDA), a topic modeling approach that assigns each word a probability distribution of topics. The example we used is an employee satisfaction survey, and each record contains a numerical rating along with a free text response as the reason. Then the sLDA algorithm selects key words of each rating as a topic, and outputs the corresponding credible intervals. Since the R package lda is available for this approach, using sLDA to identify topics for each rating is a start for automated survey text analysis, with little technical knowledge required for implementation.