Abstract:
|
Difficulties in management and aggregation of individual records or various privacy concerns make methods based on summary statistics (MBSS) particularly appealing. Applications of MBSS in genetic epidemiology include methods for combining signals across SNPs into a gene-, a haplotype block- or a pathway-level association. While combining SNP-level statistics is often as powerful as the analysis of pooled data, one needs to account for linkage disequilibrium (LD) among SNPs and for the correlation between SNPs and covariates. With no access to the original data, methods that incorporate LD information from reference panels to account for the correlation among statistics have to be employed. However, even if the original data and the reference panel stem from the same population, sampling errors introduce differences in estimated LD, leading to bias. Here, we derive a robust approximation to the distribution of sums of test statistics that can be obtained from averages of pairwise LD. This approximation allows one to perform fast and rigorous gene-based computations, resulting in improved power and type I error, without reliance on LD information from a reference population.
|