Abstract:

The two group model in testing a large number of statistical hypotheses assumes that the test statistics come from a mixture of a high probability null distribution and a low probability alternative. The most common method for testing in this model is to threshold local false discovery rates (locFDRs), which guarantees optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control. We address the challenge of controlling the false discovery rate (FDR) rather than mFDR in the two group model. Since FDR is less conservative, this results in more rejections. We derive the optimal procedure for this task, which leads to a more complex policy than the locFDR policy, since every rejection decision depends on the entire set of statistics. However, the procedure can still be described in terms of a single threshold. We show how to evaluate this threshold in time that is only quadratic in the typical number of discoveries. We demonstrate in numerical experiments that even with thousands of hypotheses, the optimal FDR controlling procedure can have a large power advantage.
