Online Program

Return to main conference page
Thursday, May 17
Machine Learning Applications
Thu, May 17, 6:15 PM - 7:15 PM
Regency Ballroom B
 

Inter- and Intra-Institutional Efforts to Build Capacity for Data Science Education (304737)

Presentation

Kayhan Batmanghelich, University of Pittsburgh 
David Boone, University of Pittsburgh 
Greg Cooper, University of Pittsburgh 
Roger Day, University of Pittsburgh 
Harry Hochheiser, University of Pittsburgh 
*Douglas Landsittel, University of Pittsburgh 
Nathan Urban, University of Pittsburgh 
Erik Wright, University of Pittsburgh 

Keywords: Data Science Education, Big Data, multidisciplinary

Data Science, as defined by the draft NIH Strategic Plan, is “the interdisciplinary field of inquiry in which quantitative and analytical approaches … extract knowledge and insights from increasingly large and/or complex sets of data.” Such definitions implicitly stress both the multidisciplinary nature of data science and the broad set of necessary skills.

This presentation describes efforts at one institution to build capacity in data science, including 1) development of educational modules, 2) overarching efforts to share resources, and 3) coordination of educational efforts across departments.

The educational modules focus on 1) bioinformatics analysis of TCGA data and the Human Microbiome Project; 3) text and natural language processing of social media data; 4) machine learning for image segmentation and registration; 5) discovery of plausible underlying causal relationships, and 6) causal inferences through propensity score-base methods. Once developed, these modules will be packaged in a common framework designed to facilitate comparison, indexing, and reuse with necessary metadata. The modules as a whole will emphasize reproducibility, skills for managing big data, and traditional statistical concepts (e.g., bias-variance trade offs). Materials will be placed in accessible open-source hosting environments.

Similar work across 13 other institutions is being coordinated to leverage efforts and produce complementary training. In addition, collaborations at the University level are working to characterize definitions of data science, educational goals and competencies, and variations in educational programs across the schools of Medicine, Computing and Information, Public Health, and Arts and Sciences.

Thus far, data science educational programs have been insufficient to meet rapidly expanding workforce demands. We hope these efforts will motivate further collaboration that better aligns with the multidisciplinary nature of data science.