Abstract:
|
To capitalize on the explosion of health data, big data computing platforms and data mining are critical for nursing and public health scientists. Reproducible workflows are also requirements in today’s open science calls for transparency. To address these needs, we have been teaching a course on “Big Data Analytics for Healthcare” for the past three years which teach these foundational skills. This presentation will provide a checklist and instructions for other instructors to follow to set up similar courses using the open source software tools of R and RStudio on the RStudio cloud platform as well as code and data sharing and version control using Git on the Github cloud platform. Different workflow approaches will be detailed and compared with pros and cons discussed. Lessons learned from both instructor and student perspectives will be presented. Exemplars of student projects using statistical modeling and data mining using these skills and workflows will be presented such as microbiome data analysis, web-scraping analysis of social-media blogs, text mining of electronic medical records, and applications of classification and regressions trees and random forests.
|