Abstract:
|
Data wrangling, or data munging, is the art of manipulating data. While not traditionally part of the undergraduate -- or graduate for that matter -- curriculum, these practical skills are highly-valued by employers. Recent work by Wickham and others has abstracted universal data manipulation operations from their programmatic syntax, and highlighted the creative aspects of thinking with data -- elevating the work of the data wrangler from janitor to architect. While Wickham's dplyr package blurs the boundary between R and SQL, the venerable relational database querying language, we present a teaching module from a data science course for undergraduates, wherein students learn data manipulation operations in both R and SQL, and learn to appreciate the capabilities of each. The Lahman baseball database -- a rich data set with many tables and endless natural questions of interest -- provides a perfect mechanism for comparison, as it is packaged painlessly for both R and SQL.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.