Online Program

Friday, February 19
CS16 Model Deployment and Diagnostics Fri, Feb 19, 3:45 PM - 5:15 PM
Emerald

Data Exploration, Model Diagnostics, and Visualization with R (303186)

*Till Bergmann, University of California, Merced 

Keywords: R, model diagnostics, linear regression, assumption checking, communication

In this talk, I will show how R can be used to both visually explore data and diagnose models. Often, summary statistics such as mean and standard deviation are used to explore data, however, visualization is commonly neglected. I will showcase examples such as bi-modality and zero-inflation where summary statistics fall short of describing the true nature of the underlying distribution. Furthermore, the consequences of such non-normal distributions will be analyzed with regards to linear models. I will demonstrate how individual data points can influence and distort linear models, as well as present ways in R to test for such data points and check for violations of assumptions. The talk will help statisticians to better understand their data and communicate results to their clients and customers, as well as help them to detect and circumvent common statistical errors and choose the correct model for their data. The talk is meant as an introduction to data exploration and model diagnostics, and as a showcase for how to leverage R to apply these theoretical notions. Apart from base R, libraries such as ggplot2 and dplyr will be used. All code and data sets will be made available.