Abstract:
|
The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (GxE) effect. Compared with marginal genetic association studies, GxE analysis requires more samples, which results in fewer discoveries. Population-based biobanks with thousands of phenotypes and hundreds of thousands of samples can be a great resource for GxE analysis. However, due to the large computation cost and the presence of case-control imbalance, existing methods cannot effectively analyze them. Here we propose a scalable and accurate method that is applicable for phenome-wide GxE studies (PheWIS). The method fits a genotype-independent logistic model for only once across the whole-genome analysis and uses saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that the method is 33-79 times faster than the standard Wald-test and is well-calibrated at the genome-wide significance level even when case-control ratios are extremely unbalanced. The analysis of UK-Biobank data with 344,341 white British samples also shows the method can efficiently analyze large sample data, controlling for unbalanced case-control ratios.
|