Keywords: Genetics, genomics, high-throughput analysis, mixed models, stochastic SVD, higher-order approximations
Funded by the National Heart Lung and Blood Institute, the Trans-Omics for Precision Medicine (TOPMed) program is currently among the largest sources of whole genome sequence data; well over 100,000 individuals have now been sequenced. TOPMed uses this genetic information to assess association of variants with multiple disease traits, measured on the same individuals, and so to uncover previously-unknown disease pathways. This is conceptually simple but in practice presents many challenges. How can mixed models be fitted, quickly, at this scale? How can we obtain accurate-enough approximate statistical tests with the extremely low error rates required? How can high-dimensional genomic and other "omic" measures be combined to efficiently uncover new signals? We describe these problems, and describe solutions being applied in TOPMed.