Abstract:
|
We present a unique training program aimed at training working professionals in government agencies on modern methods of data analysis, including database management, machine learning, record linkage, and text analysis, as well as a discussion of data privacy throughout the program. It covers all phases of a research project, addressing social issues including problem formulation, data collection, manipulation, processing, and analysis, using Python and SQL with Jupyter notebooks inside a cloud computing environment to deliver the material. This training program uses an innovative modular format that is tailored to fill specific needs of both working professionals as well as traditional and nontraditional graduate students. The data curation, management, interrogation and integration tools are built into a data facility system, within which collaboration is fostered, and which can be replicated and used by other curriculum adopters. We showcase an implementation of this training program run at the National Science Foundation with a discussion of lessons learned and future steps in furthering data science at government agencies.
|