Keywords: visualization, active learning, education, clinical data
Working with clinical data is fraught with potential pitfalls and challenges, including data quality problems, data cleaning requirements, and concept mapping. In order to familiarize our students with these issues and approaches for dealing with them, we have put together a ten hour, multi-day clinical data wrangling workshop that uses actual research data from the National Sleep Research Resource (NSRR). The workshop contains short lectures on the research domain (sleep research) and clinical data quality, as well as active learning exercises that utilize a ‘data scavenger hunt’ (using a shiny interactive visualization app) to uncover issues within the data. The final step in the exercise is for students to build a logistic regression model by choosing appropriate covariates. In terms of learning outcomes, we have noted that students show increased confidence with clinical data as well as collaboration among our students on group projects. The course materials and code are available at http://github.com/laderast/clinical_data_wrangling and the teaching dataset is available by completing a Data Use Agreement with the NSRR (http://sleepdata.org).