Abstract:
|
With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, time-to-event phenotype has attracted increasing attention in the genetics studies of human diseases. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and are not accurate when testing low-frequency and rare variants. Here we propose a scalable and accurate method, SPACox (SaddlePoint Approximation implementation based on Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives like gwasurvivr and can control type I error rates. Through the analysis of UK-Biobank inpatient data of 282,641 white British European-ancestry samples, we show that SPACox can efficiently analyze large sample size and identified 624 loci associated with time-to-event phenotypes of 12 common diseases.
|