Abstract:
|
The selective inference is an emerging field in big data analytics, it targets on conducting variable selection and at the same time provide statistical inference. Various frameworks have been developed towards this goal. Among them, there is one framework (Model-X (Cand`es et al., 2018)), which provides the most flexible tool to equip almost any machine learning method with the ability for FDR (False Discovery Rate) controlled variable selection. However, the lack of a practical and flexible method to generate knockoffs remains the major obstacle for wide application of Model-X procedure. This paper fills in the gap by proposing a model-free knockoff generator which approximates the correlation structure between features through latent variable representation. We demonstrate our proposed method can achieve FDR control and better power than two existing methods in various simulated settings and a real data example for finding mutations associated with drug resistance in HIV-1 patients.
|