Keywords: gene-set association, high dimension, multiple kernels, nonlinear effect, quantitative trait
Single-variant-based genome-wide association studies have successfully detected many genetic variants that are associated with a number of complex traits. However, their power is limited due to weak marginal signals and ignoring potential complex interactions among genetic variants. The set-based strategy was proposed to provide a remedy where multiple genetic variants in a given set (e.g., gene or pathway) are jointly evaluated, so that the systematic effect of the set is considered. Among many, the kernel-based testing (KBT) framework is one of the most popular and powerful methods in set-based association studies. Given a set of candidate kernels, the method has been proposed to choose the one with the smallest p-value. Such a method, however, can yield inflated Type 1 error, especially when the number of variants in a set is large. Alternatively one can get p values by permutations which, however, could be very time-consuming. In this study, we proposed an efficient testing procedure that cannot only control Type 1 error rate but also have power close to the one obtained under the optimal kernel in the candidate kernel set, for quantitative trait association studies. Our method, a maximum kernel-based U-statistic method, is built upon the KBT framework and is based on asymptotic results under a high-dimensional setting. Hence it can efficiently deal with the case where the number of variants in a set is much larger than the sample size. Both simulation and real data analysis demonstrate the advantages of the method compared with its counterparts.