Abstract:
|
In a series of papers, a multiple hypothesis testing procedure was developed that provably controls false discovery rate by using specifically-constructed knockoff variables. At a high level, the general procedure works by i) creating knockoff variables that obey specific properties ii) constructing test statistics based on the knockoff and original data and iii) using these properties to calculate a data-dependent threshold to select important features. In this paper, we shift our focus to the construction of the knockoff variables and show connections between these variables and several methods in the data privacy literature. Both techniques have similar goals: learn about the aggregate data without disclosing too much information about the individual observations. We show that some knockoff methods are theoretically similar to proposed privacy-preserving techniques, compare knockoff construction with several privacy methods via simulations, present methods to construct knockoff variables using a privacy-preserving synthetic data, and investigate properties of the knockoff variables from a privacy perspective.
|