Abstract:
|
Online aggregation of genetic sequencing data, and the publicly available data produced, are invaluable tools in research and the clinic. These databases have numerous applications including prioritizing causal variants and leveraging common controls. However, summarizing individual-level genotype data can mask population structure, resulting in increased potential for confounding and reduced power. This limits the utility of these databases, especially for understudied and ancestrally diverse populations.
We present Summix, a method to deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using a continental reference panel, we show our method is accurate and precise to within 0.1% for all simulation scenarios. We apply our method to the Genome Aggregation Database (gnomAD) to estimate ancestry and adjust allele frequencies within known heterogeneous ancestry groups, such as African/African-American (~84% AFR, ~14% EUR) and American/Latinx (~4% AFR, ~5% EAS, ~43% EUR, ~46% IAM).
Summix efficiently runs in seconds, and holds the potential to increase the utility and equity of summary genetic data.
|