Online Program Home
My Program

Abstract Details

Activity Number: 12 - Novel Statistical Methods for Analyzing Electronic Health Records and Biobank Data
Type: Invited
Date/Time: Sunday, July 29, 2018 : 2:00 PM to 3:50 PM
Sponsor: WNAR
Abstract #326871 Presentation
Title: Scalable Methods for Association Analysis in Biobank Scale Data Sets
Author(s): Dajiang Liu*
Companies: Penn State College of Medicine
Keywords: sufficient statistics; electronic health record; biobank; genetic association

Due to the decreasing cost high throughput sequencing and genotyping, large scale biobank datasets with hundreds of thousands of sequenced samples become available. When connected with electronic medical record, there can also be thousands of traits. Together, biobank scale datasets may contain up to 10^16 data entries, about ~10,000 times bigger than a GWAS datasets with 10,000 samples and 1 million genotyped variants. It may take 2 CPU years to complete the standard association analysis for all traits in a biobank scale dataset. The biobank scale datasets quickly outdates existing software packages. There is a compelling need to develop more efficient tools that can scale well with ultra-large scale datasets from modern genetic studies. To address this research need, we develop a novel statistical method that make use of sufficient statistics to maximize dimension reduce, eliminate redundant computation while retaining all necessary information for association analysis. The methods can be hundreds times faster than the fastest available tools such as PLINK2. We expect that the new tool will play an important role in next generation sequencing and EHR-based genetic studies.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program