Abstract:
|
For economic surveys conducted by the U.S. Census Bureau, useful data such as respondent or equivalent-quality data can sometimes be found online. The Census Bureau is researching the use of web scraping public sites to improve existing economic survey collection and processing as well as sampling frames. We will discuss our efforts to build a tool called SABLE (Scraping Assisted by Learning), which uses a combination of web crawling, web scraping, and machine learning to discover, collect, and process data from the web. We will also describe past, current, and future Census Bureau efforts to scrape state government tax revenue data, public pension data, building permit data, and other information to enhance data relevance, reduce respondent and analyst burden, and increase the quality of sampling frames. Concerns and challenges associated with these efforts are also described.
|