Abstract:
|
The significant memory required for permutation-based maxT and minP procedures makes them very difficult or often impossible to apply in practice to moderate or large datasets. This is problematic, given the nature of modern analytics and data mining. We propose a parallelized algorithm that reduces computational time by orders of magnitude. We illustrate with an analysis of 600,000 markers in a case-control study of genome-wide association for over 2000 subjects, for which permutation-based minP requires several years of computing time using the genetic analysis software package PLINK. Our approach yielded results in less than two hours. In addition, we propose a bootstrapping modification to maxT/minP that improves their statistical performance when analyzing binomials with very small outcome probabilities.
|