Abstract:
|
With the explosion of the Data Science Field over the past 5 years, multiple hypothesis testing has become the status quo. However, if you test enough hypothesis, you'll find a few statistically significant results, even when there are no real differences. Thus, the control of the false discovery rate (FDR) is imperative. In this talk, we first explore some pitfalls when the critical assumption of independence between hypotheses (test statistics) are violated. We then investigate the effects of the natural dependence of modern large-scale testing. In particular, we examine the positive regression dependency on a subset of true null hypotheses (PRDS). It turns out estimating the number of true null hypotheses is especially difficult for non-normal data. Furthermore, weak dependency can be justified if there is no impact on the null p-value distribution (Uniform), as well as no impact on the variance of false discovery proportion. Under this framework, researchers can ensure independency of hypotheses when testing a large-scale hypothesis simultaneously.
|