Name: 2018 Joint Statistical Meetings
Start: 2018-07-28T07:00:00+00:00
End: 2018-08-02
Location: Vancouver Convention Centre

Activity Number:	353 - Data Science
Type:	Contributed
Date/Time:	Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Computing
Abstract #327050	Presentation
Title:	A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data
Author(s):	Ben Baumer*
Companies:	Smith College
Keywords:	statistical computing; reproducibility; databases; data wrangling; SQL; tidyverse
Abstract:	Many interesting data sets available on the Internet are of a medium size---too big to fit into a personal computer's memory, but not so large that they won't fit comfortably on its hard disk. In the coming years, data sets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable hub-and-spoke framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality.

Authors who are presenting talks have a * after their name.

JSM 2018 Online Program