Abstract:
|
In recent years, false discovery rate (FDR) control has gradually attracted attention to improve the reproducibility of variable selection. We focus on the variable selection problem for $l_1$-regularized logistic regression with $p$ variables and $n$ samples. In addition, we assume $n$, $p$ follow a linear growth rate $n/p \to \delta \in (0, \infty)$ which include both $n>p$ and $n \leq p$ cases. Since the $l_1$-regularizer by nature performs variable selection, we characterize its asymptotic FDR-power tradeoff and classification accuracy using a system of equations with six parameters. Further, we propose a sample size calibration procedure to achieve certain power under pre-specified FDR using the FDR-power tradeoff. Similar asymptotic analysis for the model-X knockoff, which provides FDR controlled selection, is also investigated. We illustrate the FDR-power analysis and the corresponding sample size calibration using simulated and real data.
|