Abstract:
|
We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from private data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear regression. While other recent work can produce confidence intervals for more general sets of procedures, they either yield only approximately unbiased estimates, are designed for one-dimensional outputs, or assume significant user knowledge about the data-generating distribution. In contrast, our method requires negligible user knowledge and is designed such that unbiasedness and validity of the confidence intervals hold with arbitrarily high probability for general procedures and high dimensions. These theoretical guarantees hold provided that the estimation strategy, when run over subsets of a partition of the original data, produces estimates following a multivariate Gaussian distribution. We demonstrate estimator performance over common models on both real and synthetic data.
|