Abstract:
|
As the era of information and technology continues to dominate, big data offers tremendous benefits for education, economics, medical research, national security, and other areas through data-driven decision-making, insight discovery, and process optimization. However, one of the significant challenges in analyzing big data is the extreme risk of exposing personal information of individuals who contribute to the data when sharing it among collaborators or releasing it publically. An intruder could identify a participant by isolating the numerous connections to other contributors within the big dataset. One method that preserves differential privacy (DP), a condition on data releasing algorithms with strong mathematical guarantee for individual privacy protection, is differentially private data synthesis (dips). This approach generates synthetic individual-level data while guaranteeing privacy at a prespecified level from DP. We explore various partitioning methods for dips on datasets with a large number of observations to improve the statistical utility and compare them to provide guidance on practical feasibility.
|