Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 60 - Nonparametrics in High-Dimensional Data
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Nonparametric Statistics
Abstract #313724
Title: False Discovery Rate Control via Data Splitting
Author(s): Chenguang Dai* and Buyu Lin and Xin Xing and Jun S. Liu
Companies: Harvard University and University of Science and Technology of China and Harvard University and Harvard University
Keywords: Data splitting; False discovery rate; Feature selection; High-dimensional data
Abstract:

This paper introduces a way to asymptotically control the false discovery rate (FDR) using data splitting. For each feature, the method estimates two independent significance coefficients via data splitting and constructs a contrast statistic. The FDR control is achieved by taking advantage of the statistic's property that, for any null feature, its sampling distribution is symmetric about 0. We further propose a strategy to aggregate multiple data splits to stabilize the selection result and boost the power. Interestingly, this multiple data-splitting approach appears capable of overcoming the power loss caused by data splitting with FDR still under control. The proposed framework is applicable to canonical statistical models including linear models, generalized linear models, and Gaussian graphical models. Simulation results, as well as a real data application, show that the proposed approaches, especially the multiple data-splitting strategy, control FDR well, and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program