Online Program Home
My Program

Abstract Details

Activity Number: 288 - Genomical Is the New Astronomical: Big Data Algorithms and Applications in Genomics
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #328549
Title: Big Data Distributed System for Phenome and Genome Management and Analysis in a Large Health System
Author(s): Wendy Wong* and Xinyue Liu and Prachi Kothiyal and Wei Zhu and Fang Zhou and Shan Gao and Sakthi Madhappan and Lin Smith and Henry Hunter and Aaron Black and John F Deeken and John E Niederhuber
Companies: Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Insitute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute and Inova Translational Medicine Institute
Keywords: genomics; big data; spark; Hadoop; database; analysis
Abstract:

The continuous incoming of High Throughput Sequencing data quickly overwhelms the bioinformatics analysis paradigm based on traditional clusters and relational databases. Innovative "Big data" solutions built on the open-source Apache Hadoop and Spark cluster technology have been employed to address the challenge. ADAM and Hail are two of the cutting-edge projects in the area of big data genomics. To leverage these powerful new tools while considering the practical applications to support Inova Health System's translational genomic research, we are building an integrated system composed of a Hadoop data warehouse (DW) with Cloudera Impala as the backend, an ETL (Extraction, Transformation, Loading) workflow using ADAM and Spark, an analysis platform middle tier powered by Spark and Hail, and a web front-end for ad hoc query and interactive data analysis. Examples on use cases are presented to demonstrate the power of our integrative big data genomic system for handling petabyte-scale data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program