Online Program Home
My Program

Abstract Details

Activity Number: 495 - The Potential for Web-Scraping in the Production of Official Statistics: An Opportunity for Statistics to Lead?
Type: Invited
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Government Statistics Section
Abstract #326621 Presentation
Title: Modernizing Census Bureau Economic Statistics through Web Scraping
Author(s): Brian Dumbacher* and Carma Ray Hogue
Companies: U.S. Census Bureau and U.S. Census Bureau
Keywords: U.S. Census Bureau; web scraping; official statistics; economic statistics; passive data collection

For economic surveys conducted by the U.S. Census Bureau, useful data such as respondent or equivalent-quality data can sometimes be found online. The Census Bureau is researching the use of web scraping public sites to improve existing economic survey collection and processing as well as sampling frames. We will discuss our efforts to build a tool called SABLE (Scraping Assisted by Learning), which uses a combination of web crawling, web scraping, and machine learning to discover, collect, and process data from the web. We will also describe past, current, and future Census Bureau efforts to scrape state government tax revenue data, public pension data, building permit data, and other information to enhance data relevance, reduce respondent and analyst burden, and increase the quality of sampling frames. Concerns and challenges associated with these efforts are also described.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program