Online Program Home
My Program

Abstract Details

Activity Number: 668 - Best Practices for Programming and Analysis
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #330625
Title: Leveraging "Medium-Sized" Data for Statistical Inference and Model Estimation of Data Gaps in International Energy Statistics Using R
Author(s): Glendon Haynes*
Companies: Energy Information Administration
Keywords: R; Energy; International; Hypothesis Testing; RMSE; MAPE

EIA's international energy statistics (IES) consists of almost 40 years of data for around 220 countries for a variety of fuel and fuel subcomponent production and consumption activities. Analysis of the data is a relatively large scale data problem when including hypotheses testing for statistical inference and modeling. Not a "big data" problem, this "medium-sized" data problem still requires nimble data processing to facilitate data comprehension and analysis. EIA is developing a research tool in R to process data for efficient model analysis of IES. R functions create standardized data formatting of data pulled directly from databases or APIs as the first step in the process. Additional functions easily allow varied groupings of the data by geography or fuel component to test model specifications on a large scale. Currently, estimates of lagged data are derived from relatively simple ARIMA and extrapolation models with exogenous variables. The estimates are tested using out-of-sample root-mean-squared error or mean-absolute-percent error measures. Many of the techniques and tools developed for this project are applicable to the analysis of other large data sets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program