Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 203 - Contemporary Machine Learning
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #312204
Title: Transfer Learning for Auto-Coding Free-Text Survey Responses
Author(s): Peter Baumgartner* and Amanda Smith and Murrey Olmsted and Dawn Ohse and Bucky Fairfax
Companies: RTI International and RTI International and RTI International and RTI International and RTI International
Keywords: natural language processing; machine learning; text analytics; qualitative coding; free text; open-ended question

Coding responses from free-text, open-ended survey questions (i.e., qualitative coding) can be a labor-intensive process. The resource requirements for qualitative coding can prevent researchers from extracting value from free-text responses and can influence decisions about the inclusion of open-ended questions on surveys. Machine learning (ML) has been proposed as a potential solution to alleviate coding burden, but traditional ML methods for text classification require large amounts of training data usually not available from conventional survey sample sizes.

With that problem in mind, we evaluated a ML approach for auto-coding free-text survey responses, employing data augmentation and recent advances in transfer learning models for natural language processing. Specifically, we used previously coded responses from an open-ended question on a 2018 employee survey to train a model that predicted 24 unique topical codes applied to responses on the same question on the 2019 survey. A coding team adjudicated these predictions and corrected the predictions when applicable. We achieved promising performance despite an original training dataset of under 3,000 survey responses.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program