Online Program Home
My Program

Abstract Details

Activity Number: 153 - Developing Multi-Purpose Imputed or Synthetic Data for Official Statistics
Type: Invited
Date/Time: Monday, July 29, 2019 : 10:30 AM to 12:20 PM
Sponsor: Government Statistics Section
Abstract #300520 Presentation
Title: MLDS Synthetic Data Project: An Evaulation
Author(s): Mark Lachowicz* and Daniel Bonnery and Yi Feng and Angela Henneberger and Tessa Johnson and Bess Rose and Terry Shaw and Laura Stapleton and Michael Woolley and Yating Zheng
Companies: University of Maryland, College Park and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, Baltimore and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, College Park
Keywords: synthetic data; state data systems; longitudinal data
Abstract:

For the last five years, the Maryland Longitudinal Data System (MLDS) has been serving as a central repository of highly confidential student and workforce data. The Institute of Educational Sciences funded a project to create, evaluate, and potentially release Synthetic versions of the MLDS data. We will present an evaluation of the synthetic data in terms of research validity and disclosure risk. The evaluation of research validity of the synthetic data will include efforts to assess the general utility (e.g., comparisons of variable distributions) and specific utility (e.g., comparisons of parameter estimates from analyses on real and synthetic data). We will also present another vital step in our synthetic data project: the assessment of disclosure risk of the synthetic data. That assessment is required to comply with laws governing the confidentiality of state held data and it will also be a necessary step in seeking permission from MLDS Governing Board to release the synthetic data. We will present our progress in the disclosure risk assessment process. Finally, we will discuss the benefits of the synthetic data for researchers who do not have access to the real data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program