Abstract:
|
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). However, these methods require individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a "Regression with Summary Statistics" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results and estimates of correlations among covariates (SNPs), which both can be obtained from public databases. We combine the RSS likelihood with suitable priors and sample posteriors by MCMC. In a wide range of simulations based on real genotypes, RSS performs similarly to analyses using the individual data. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height. Software is available at https://github.com/stephenslab/rss.
|