Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 54 - Record Linkage, Data Integration, and Improving Survey Measurement
Type: Contributed
Date/Time: Sunday, August 8, 2021 : 3:30 PM to 5:20 PM
Sponsor: Survey Research Methods Section
Abstract #318299
Title: A Model-Assisted Approach for Finding Coding Errors in Manual Coding of Open-Ended Questions
Author(s): Matthias Schonlau* and Zhoushanyue He
Companies: University of Waterloo and University of Waterloo
Keywords: machine learning ; statistical learning; intercoder disagreement; coding error
Abstract:

Text answers to open-ended questions are typically manually coded into one of several codes. Usually, a random subset of text answers is double-coded to assess intercoder reliability, but most of the data remain single-coded. Any disagreement between the two coders points to an error by one of the coders. When the budget allows double coding additional text answers, we propose employing statistical learning models to predict which single-coded answers have a high risk of a coding error. Specifically, we train a model on the double-coded random subset and predict the probability that the single-coded codes are correct. Then text answers with the highest risk are double-coded to verify. In experiments with three data sets we found that this method identifies 2-3 times as many coding errors in the additional text answers as compared to random guessing, on average. We conclude this method is preferred if the budget permits additional double-coding. When there are a lot of intercoder disagreements, the benefit can be substantial.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program