Abstract:
|
When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies prior to making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the collected data. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Our goal here is to shed light on this problem, propose a solution -- referred to as "differential smoothing" -- and illustrate our approach using sale prices of homes in San Francisco.
|