Abstract:
|
The false discovery rate quantifies the proportion of false discoveries among a set of hypothesis tests called significant, and it is typically estimated based on one-dimensional summaries such as p-values or test statistics. In some scenarios, there is additional information that may be used to obtain more accurate estimates. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an "informative variable", is available. The false discovery rate and its components are then treated as functions of this informative variable. We consider two applications in genomics: (i) detecting eQTLs in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene, which affects the prior probability of an association; (ii) identifying differentially expressed genes in an RNA-Seq experiment carried out in mice. In this experiment the informative variable is the per-gene read depth, which has been shown to be a strong determinant of statistical power.
|