Abstract:
|
In large-scale genomic studies such as genome-wide association studies and high-throughput gene expression analysis, we often need to accurately estimate small p-values for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. Previously, we proposed a novel approach, MCMC-CE, that combines the principle of the cross-entropy (CE) method and Markov chain Monte Carlo (MCMC) sampling techniques, for accurately and efficiently estimating small p-values for a broad range of complicated test statistics. Here we further extend the MCMC-CE approach to the estimation of the distributions and small tail probabilities of the quadratic forms of multivariate normal variables, which has wide applications in statistics including statistical genomics. Through combing with leading-eigenvalue extraction and Satterthwaite approximation, MCMC-CE can accurately estimate the small tail probabilities of large-scale quadratic forms with rank over 1000. Compared with existing methods such as Davie’s, Imhof’s, Farebrother’s methods and saddle-point approximations, MCMC-CE can achieve much higher accuracy even for extremely small p-values.
|