Online Program

Return to main conference page
Thursday, May 17
Computing Science
Combining Federal and Regional Data Sources: Challenges and Solutions
Thu, May 17, 5:15 PM - 6:15 PM
Lake Fairfax B
 

Use of the Quarterly Census of Employment and Wages and Third-Party Sources for EIA Surveys (304717)

*Nanda Srinivasan, Energy Information Administration 

Keywords: survey, web data, record linkage, frame development, crowd-sourced data

This presentation examines the challenges in integrating data from multiple sources. As a part of the Statistical Methodology Improvement Plan, the Energy Information Administration (EIA) is researching and utilizing combined federal and third-party sources to support frame development and to enhance data quality for its Petroleum Marketing surveys. The Motor Gasoline Price Survey (EIA-878) is a weekly mandatory survey of retail gasoline stations across the country. In order to develop a new frame for this survey, EIA reviewed 10 federal and third-party databases to scope the frame population and utilized the Bureau Labor Statistics (BLS) Quarterly Census of Employment and Wages (QCEW) frame to enhance the data about firms (e.g., ownership, addresses, etc.). EIA has also researched available third party price data from crowd sourcing (Gasbuddy.com) and industry providers (OPIS) against its own survey data to determine ways to enhance and supplement its weekly reported prices with blended data. Based on the success of this project, EIA has begun conducting a complete survey improvement project for its Petroleum Product Sales Identification Survey (EIA-863), a triennial census of petroleum product sellers regarding annual volumes of petroleum products sold. For this project, EIA is looking to combine QCEW data with trade association data and web source information collected via a web crawler/scrapper. Combined, these two projects are providing EIA with successes to expand its approaches to develop and conduct surveys. EIA is uniquely situated since few of its surveys collect information that is also compiled, albeit for a different purpose and sold by commercial vendors. These commercial vendors can provide almost real-time frequency of data that when linked with surveys, have the potential to reduce respondent burden and enhance data products. However, there are statistical challenges including record linkage and evaluation of potential sources of error in commercial sources such as coverage error, specification error, measurement error, and missing data.