Abstract:
|
Datasets containing sensitive information, particularly about single individuals, are easy to find in different contexts. This information has to be protected, which often implies the dataset cannot be released. Limited access to datasets affects those analysts interested in global statistics, but not necessarily in single records. In this work we present two protected inferential procedures for private datasets in the linear regression context. Specifically, we describe differential private procedures for the sign and significance of regression coefficients. Procedures satisfying the differential privacy property allow releasing global statistics while controlling the amount of sensitive information that could be disclosed. The procedures are designed to make model-based inferences in finite and infinite population settings. The proposal combines subsample and aggregate methods, Laplace mechanism and t-statistics. We assess the performance of our proposal through analyses on simulated and real-life datasets.
|