With the increased emphasis being placed on the development of Data Science skills within the traditional Statistics curriculum, including SQL into various Probability and Statistics courses is important. We present a examples of using SQL commands to compute estimated probabilities and conditional probabilities with large data sets, both real and simulated.
Probabilities are computed as counting occurences of specified events for the full data set. Conditional probabilities are computed as occurences of specified event for subsets of the data set.
Examples will be given for three levels of instruction, introductory Statistics classes, undergraduate Statistics majors, and for MS Statistics student. SQL will be implemented in R and using sqlite.
|