Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 53 - Applications of Data Linkage and Machine Learning Techniques
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Survey Research Methods Section
Abstract #313652
Title: Using Data Science to Build Survey Sampling Frames from Scratch
Author(s): Joseph Rodhouse* and Tyler Wilson
Companies: National Agricultural Statistics Service and National Agricultural Statistics Service
Keywords: Data science; Web scraping; Sampling frames; Undercoverage
Abstract:

One shortcoming of survey sampling frames, such as list frames, is that they may not cover the entire target population for a given survey. In short, frames may suffer from what statisticians refer to as undercoverage. As a result, research organizations sometimes use area frames to address potential undercoverage issues on list frames. One drawback of area frames is that they are costly and require significant resources to build and maintain. To explore the idea of addressing undercoverage on list frames using less resource-intensive methods than area frames, the National Agricultural Statistics Service (NASS) undertook an effort to build a sampling frame from scratch using data gathered from web-scraping technology. This paper details how NASS transformed raw and limited data from web-scraping technology into a robust survey sampling frame that ultimately allowed for the selection of a complex survey sample through the use of several data science methods. Survey results from the selected sample showed that this approach successfully addressed undercoverage in some areas. A path forward using this methodology is discussed in the conclusion of this paper.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program