Abstract:
|
Genomic data are subject to various confounding effects due to demographic, environmental, biological, and technical factors. To identify genomic features associated with a variable of interest in the presence of confounders, traditional approach involves fitting a confounder-adjusted regression model to each genomic feature as outcome followed by multiplicity correction. It is well known that confounder adjustment reduces statistical power substantially since it is difficult to discern the true origin of the effect under strong confounding. To overcome the problem, this paper proposes a model-free two-dimensional false discovery rate control procedure (MF-2dFDR) to increase the detection power. MF-2dFDR uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. The key innovation here is that MF-2dFDR provides valid inference from samples in settings in which the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown.
|