Abstract:

A recent framework called modelX knockoffs performs variable selection while nonasymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a preciselyknown (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model that can have number of parameters at the order of $np$, the number of observations times the number of variables. We demonstrate how to do this for 3 models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.
