Abstract:
|
The Sequence Kernel Association Test (SKAT) is widely used to test for associations between a phenotype and a set of variants. Computing p-values for SKAT requires the eigenvalues of the genotype covariance matrix, or a similar matrix of equal size - an n x n matrix, where n is the number of subjects or variants, whichever is lower. Extracting the full set of eigenvalues has computational complexity proportional to n^3, and currently limits the use of SKAT. To overcome this, we propose fastSKAT, a new computationally-efficient but accurate approximation, in which only the k largest eigenvalues for SKAT are extracted and a remainder term is evaluated using a Satterthwaite approach. For sample sizes seen in current sequencing studies, these innovations make SKAT tests feasible with at least an order of magnitude more variants than current approaches. We illustrate fastSKAT on several large datasets, describing its computation stability, accuracy in terms of Type I error rates, and computational speed. We show that fastSKAT quickly and accurately implements SKAT analyses for large numbers of markers, and illustrate how, used with sequence data, it will help address questions that were previously intractable.
|