Abstract:
|
Coding responses from free-text, open-ended survey questions (i.e., qualitative coding) can be a labor-intensive process. The resource requirements for qualitative coding can prevent researchers from extracting value from free-text responses and can influence decisions about the inclusion of open-ended questions on surveys. Machine learning (ML) has been proposed as a potential solution to alleviate coding burden, but traditional ML methods for text classification require large amounts of training data usually not available from conventional survey sample sizes.
With that problem in mind, we evaluated a ML approach for auto-coding free-text survey responses, employing data augmentation and recent advances in transfer learning models for natural language processing. Specifically, we used previously coded responses from an open-ended question on a 2018 employee survey to train a model that predicted 24 unique topical codes applied to responses on the same question on the 2019 survey. A coding team adjudicated these predictions and corrected the predictions when applicable. We achieved promising performance despite an original training dataset of under 3,000 survey responses.
|