Online Program Home
My Program

Abstract Details

Activity Number: 495 - The Potential for Web-Scraping in the Production of Official Statistics: An Opportunity for Statistics to Lead?
Type: Invited
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Government Statistics Section
Abstract #326610 Presentation
Title: The Potential for Web-Scraping in the Production of Official Statistics: An Opportunity for Statistics to Lead?
Author(s): Linda J Young*
Companies: USDA National Agricultural Statistics Service
Keywords: web scraping; capture-recapture; undercoverage
Abstract:

Beginning in 2012, the USDA's National Agricultural Statistics Service (NASS) began using capture-recapture methods to account for undercoverage, nonresponse, and misclassification for its Census of Agriculture. The capture-recapture samples were the respondents from the NASS list frame and a sample from the NASS area frame. A challenge with using the area frame for the second sample is that the types of farms that are often not well covered by the NASS list frame tend to be sparse in the JAS sample. Thus, NASS has been evaluating the use of web-scraped list frames as a second frame from which a sample could be drawn to assess undercoverage for surveys. For the 2015 Local Foods Marketing Survey, samples were drawn from the NASS list frame and a web-scraped list frame, and capture-recapture methods were used to provide official estimates. Here the assumptions underlying the capture-recapture methodology when samples are drawn from two list frames are considered. To the extent possible, data from the 2015 Local Foods Marketing Survey are used to assess the validity of the assumptions. The effect of violation of the assumptions are explored through simulation.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program