Abstract:
|
In the Fall of 2015, we have offered a freshman-level course at Berkeley that treats "Data Science" as a fine-grained blend of rich intellectual traditions in computer science and statistics. Computer science is more than just programming; it is the creation of appropriate abstractions to express computational structures and the development of algorithms that operate on those abstractions. Similarly, statistics is more than just collections of estimators and tests; it is the interplay of general notions of sampling, models, distributions and decision-making. Our course is based on the idea that these styles of thinking support each other. In teaching statistical inference, rather than making use of formulas and asymptotic justifications, we teach the computing concepts needed to transform and visualize data and to implement resampling-based inferential procedures. Students learn to program (in Python), learning the language gradually in the service of increasingly sophisticated data analysis problems. Moreover, students work throughout with real data sets and learn to draw substantive conclusions. This has been a strikingly successful way to introduce students to statistics.
|