Abstract:
|
Sample obfuscation is the widely studied, challenging problem of providing access to a data sample while guarding aspects of its privacy. Sample obfuscation can take different forms, including masking or redaction to protect sample variables or anonymization or the methodology of differential privacy to secure individuals' data records. This work extends the notion of sample obfuscation to obfuscation of populations. Population obfuscation aims to protect information and features of a whole statistical population of data, the population being represented by an algorithm, formula, model, or sampling plan from which users can synthesize or otherwise access unlimited numbers of data records. Canonical sample masking can be extended to allow masking generally of functions of sample variables. With this extension we present a conceptual framework for population masking, with elementary examples of both canonical and general population masking. Three procedures are outlined for masking a population, one based on transfer learning, one on data augmentation, and one on optimal transport. We also introduce the idea of inherent population masking and offer a simple class of time series examples in which it occurs.
|