Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Abstract Details

Activity Number:	126 - SPEED: New Methods in Statistical Genomics and Genetics Part 1
Type:	Contributed
Date/Time:	Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistics in Genomics and Genetics
Abstract #306664	Presentation
Title:	Efficient Estimation of Ancestry Proportions Using Genotype Frequencies
Author(s):	Jordan Hall* and Megan Sorenson and Ryan Scherenberg and Alexandria Ronco and Yinfei Wu and James Vance and Jinyan Lyu and Christopher Gignoux and Audrey E Hendricks
Companies:	University of Colorado Denver and University of Colorado Denver and and University of Colorado Denver and University of Colorado Denver and University of Colorado Denver and University of Colorado Denver and University of Colorado Denver and University of Colorado Denver
Keywords:	SLSQP; Optimization; Genetics; gnomAD; Ancestry; Quadratic programming
Abstract:	Public genetic data enables efficient and more equitable access, transforming genetic and medical research. Due to privacy concerns, data is often provided by group genotype frequency rather than individually. Grouping can mask important information, such as fine-scale ancestry, and imprecise ancestry information may lead to misdiagnoses and incorrect genetic associations. We present a method to estimate hidden ancestry proportions in genotype frequency data. With more ancestries and therefore dimensions in the data, estimating these proportions quickly and precisely is problematic. We employ Sequential Least Squares Quadratic Programming (SLSQP), an iterative minimization algorithm for constrained, nonlinear problems. Grid search took >1 hour to produce estimates for 6 ancestries at a 1% precision; SLSQP gives results in seconds at < 0.1% precision. We apply our method to open databases including the genome Aggregation Database (gnomAD) v2.1 African sample (N = 12,487) where we find only ~85% African ancestry with the remaining ancestry from mostly Europe. Our method and accompanying R and Python packages provides precise ancestry information for growing open genetic resources.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program

JSM 2019 Online Program

Abstract Details

American Statistical Association