JSM 2015 Online Program

Online Program Home
My Program

Abstract Details

Activity Number: 224
Type: Invited
Date/Time: Monday, August 10, 2015 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Sports
Abstract #314537 View Presentation
Title: Data Wrangling for the Lahman
Author(s): Ben Baumer*
Companies: Smith College
Keywords: data wrangling ; data manipulation ; data munging ; baseball ; sql ; dplyr

Data wrangling, or data munging, is the art of manipulating data. While not traditionally part of the undergraduate -- or graduate for that matter -- curriculum, these practical skills are highly-valued by employers. Recent work by Wickham and others has abstracted universal data manipulation operations from their programmatic syntax, and highlighted the creative aspects of thinking with data -- elevating the work of the data wrangler from janitor to architect. While Wickham's dplyr package blurs the boundary between R and SQL, the venerable relational database querying language, we present a teaching module from a data science course for undergraduates, wherein students learn data manipulation operations in both R and SQL, and learn to appreciate the capabilities of each. The Lahman baseball database -- a rich data set with many tables and endless natural questions of interest -- provides a perfect mechanism for comparison, as it is packaged painlessly for both R and SQL.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program

For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home