Abstract:
|
In biobank data, most binary phenotypes have unbalanced case-control ratios, which can cause inflated type I error rates. Recently, a saddlepoint approximation (SPA) based single variant test has been developed to provide an accurate and scalable method to test such associations. For region-based tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate or not scalable for large data analyses. To address these issues, we develop a robust method, where the single-variant score statistic is calibrated based on SPA and Efficient Resampling (ER). Through simulation studies and UK Biobank whole exome sequence analysis, the proposed method provides well-calibrated p-values. It also has similar computation time as unadjusted approaches and is scalable for large samples. We further extend robust methods and propose a scalable generalized mixed model region-based test(SAIGE-GENE) to adjust sample relatedness. Through the analysis of the HUNT study of 69716 Norwegian samples and the UK Biobank data of 408910 White British samples, SAIGE-GENE can efficiently analyze large sample data with type I error rates well controlled.
|