Online Program Home
My Program

Abstract Details

Activity Number: 660 - Machine Learning: Advances and Applications
Type: Contributed
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #306534 Presentation
Title: Using Machine Learning Algorithms to Reduce Data Collection Costs
Author(s): Gavin Corral* and Tyler Wilson
Companies: National Agricultural Statistics Service (NASS) and USDA, NASS
Keywords: Machine Learning; Bias; Sample; Cost; Survey; Propensity

The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) distributes numerous surveys annually and conducts the census of agriculture every 5 years. The surveys are extensive and often very costly. In an effort to reduce data collection costs, NASS is currently using multiple machine learning techniques including response propensity modeling (RPM) to estimate the record-level probability of response to a survey. These propensity scores allow the records to be ordered from those likely to respond to those that are unlikely to respond. All records with a propensity score below a predetermined cutoff are flagged as being highly unlikely to respond. These highly unlikely to respond records are candidates for removal from the sample. In this study, the efficacy of removing some or all of the highly unlikely to respond records are examined. Also, an importance measure which incorporates the relative size of the operation, rarity, and state level impact will be used to identify potential bias.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program