Saturday, June 1
Practice and Applications
When Biomedical Data Gets Big: Challenges and Solutions in Biomedical Data Science
Sat, Jun 1, 2:45 PM - 3:50 PM
Grand Ballroom E

Analysis of Whole Genome Sequence Analysis in >100k Individuals: Experience in the TOPMed Program (305125)

*Ken Rice, Universiry of Washington 

Keywords: Genetics, genomics, high-throughput analysis, mixed models, stochastic SVD, higher-order approximations

Funded by the National Heart Lung and Blood Institute, the Trans-Omics for Precision Medicine (TOPMed) program is currently among the largest sources of whole genome sequence data; well over 100,000 individuals have now been sequenced. TOPMed uses this genetic information to assess association of variants with multiple disease traits, measured on the same individuals, and so to uncover previously-unknown disease pathways. This is conceptually simple but in practice presents many challenges. How can mixed models be fitted, quickly, at this scale? How can we obtain accurate-enough approximate statistical tests with the extremely low error rates required? How can high-dimensional genomic and other "omic" measures be combined to efficiently uncover new signals? We describe these problems, and describe solutions being applied in TOPMed.