Abstract:
|
Data science is often described as the intersection between statistics and computer science. What then should be taught in a "data science" course that is not already covered by existing statistics and computer science courses? I argue that what is missing from existing courses is a discussion of the different ways that data can be represented and organized. Statistics and computer science courses often assume that data is already in tabular form (with one row per observation and one column per variable), but data is increasingly coming in textual, spatial, and hierarchical forms. I argue that the focus of a "data science" course should be how to extract meaning from these diverse forms of data (e.g., how to convert them to tabular form, how to create appropriate visualizations).
|