Activity Number:
|
153
- Developing Multi-Purpose Imputed or Synthetic Data for Official Statistics
|
Type:
|
Invited
|
Date/Time:
|
Monday, July 29, 2019 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Government Statistics Section
|
Abstract #300520
|
Presentation
|
Title:
|
MLDS Synthetic Data Project: An Evaulation
|
Author(s):
|
Mark Lachowicz* and Daniel Bonnery and Yi Feng and Angela Henneberger and Tessa Johnson and Bess Rose and Terry Shaw and Laura Stapleton and Michael Woolley and Yating Zheng
|
Companies:
|
University of Maryland, College Park and University of Maryland and Maryland Longitudinal Data System Center and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, Baltimore and University of Maryland, College Park and University of Maryland, Baltimore and University of Maryland, College Park
|
Keywords:
|
synthetic data;
state data systems;
longitudinal data
|
Abstract:
|
For the last five years, the Maryland Longitudinal Data System (MLDS) has been serving as a central repository of highly confidential student and workforce data. The Institute of Educational Sciences funded a project to create, evaluate, and potentially release Synthetic versions of the MLDS data. We will present an evaluation of the synthetic data in terms of research validity and disclosure risk. The evaluation of research validity of the synthetic data will include efforts to assess the general utility (e.g., comparisons of variable distributions) and specific utility (e.g., comparisons of parameter estimates from analyses on real and synthetic data). We will also present another vital step in our synthetic data project: the assessment of disclosure risk of the synthetic data. That assessment is required to comply with laws governing the confidentiality of state held data and it will also be a necessary step in seeking permission from MLDS Governing Board to release the synthetic data. We will present our progress in the disclosure risk assessment process. Finally, we will discuss the benefits of the synthetic data for researchers who do not have access to the real data.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2019 program
|