Online Program Home
My Program

Abstract Details

Activity Number: 591 - Synthetic Data and Data Disclosure
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 2:00 PM to 3:50 PM
Sponsor: Government Statistics Section
Abstract #329296
Title: Challenges Confronted and Insights Revealed in Synthesizing State-Level Integrated Data
Author(s): Daniel Bonnery* and Michael E Woolley and Laura Stapleton and Tessa Johnson and Angela Henneberger and Bess Rose and Yi Feng and Terry Shaw and Yating Zheng
Companies: University of Maryland and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland and Maryland Longitudinal Data System Center
Keywords: Synthetic data; Data integration; Longitudinal data; State data systems
Abstract:

The Maryland Longitudinal Data System (MLDS) is a central repository of student and workforce data, including data provided by the Maryland State Department of Education, the Maryland Higher Education Commission and the Maryland Department of Labor, Licensing and Regulation. The Institute of Educational Sciences is funding a project to produce and release synthetic versions of selected longitudinal state-level datasets. The use of synthetic data is of increasing interest to many state longitudinal integrated data systems that also seek to balance researcher access with data privacy concerns. Practical tools that implement generic synthesization methods exist. Nevertheless, synthesizing large integrated data presents specific methodological and practical challenges. Longitudinal integrated data involves: a lot of variables, redundancy (and inconsistency) of information, specific often non-random missing data patterns, and different levels or dimensions. We propose to detail the nature and implications of these challenges and describe the solutions we are applying in our ongoing MLDS Synthetic Data Project.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program