Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 207 - Survey Mode Research
Type: Contributed
Date/Time: Monday, August 8, 2022 : 2:00 PM to 3:50 PM
Sponsor: Survey Research Methods Section
Abstract #323313
Title: Data Science Informed by Survey Science: Collecting More Accurate Labels
Author(s): Stephanie Eckman* and Jacob Beck and Frauke Kreuter
Companies: RTI International and LMU and University of Maryland
Keywords: web survey; measurement error; data science; data labelling
Abstract:

Machine learning models rely on high-quality input data, for example, images labelled as dogs vs cats or text labelled as positive or negative sentiment. The instruments used to collect these labels are similar to web surveys, except that the questions are about images or text rather than about the labelers themselves. Our study tests whether the principles of data quality in web surveys also apply to the collection of labels for machine learning models.

We fielded two versions of an instrument to code the sentiment of tweets. All tweets have been previously coded and thus gold-standard labels exist. By comparing the labels collected in the two versions, we provide the first evidence that instrument design matters in the collection of labels for data science. We also investigate annotator-effects, drawing a parallel to interviewer effects in the survey literature. Our results will interest data scientists who want to save time and money by collecting high quality labels.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program