Online Program Home
My Program

Abstract Details

Activity Number: 33
Type: Contributed
Date/Time: Sunday, July 31, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #319612
Title: Big Data Regression and Prediction for High-Throughput Genomic Data
Author(s): Weiqiang Zhou* and Ben Sherwood and Zhicheng Ji and Fang Du and Jiawei Bai and Hongkai Ji
Companies: Johns Hopkins Bloomberg School of Public Health and Johns Hopkins Bloomberg School of Public Health and Johns Hopkins Bloomberg School of Public Health and Johns Hopkins Bloomberg School of Public Health and Johns Hopkins Bloomberg School of Public Health and Johns Hopkins Bloomberg School of Public Health
Keywords: gene regulation ; gene expression ; DNase I hypersensitivity ; big data prediction
Abstract:

The explosive growth of high-throughput genomic data brings challenges as well as opportunities to statistics. Such large scale of data makes it possible to predict one high-throughput genomic data type from another data type. This can be formulated as a challenging big data regression problem of fitting millions of high-dimensional regression models simultaneously. Here, we introduce BIRD, a big data regression model, to handle such high dimensionality and heavy computation. BIRD utilizes the correlation structure within and between data types to make fast and accurate predictions. We applied BIRD to predict DNase I hypersensitivity (DH) based on gene expression. We found that gene expression to a large extent predicts DH. We show that the predicted DH predicts transcription factor binding sites (TFBSs), BIRD can be applied to gene expression samples in Gene Expression Omnibus (GEO) to predict regulome for various biological contexts, and the predicted DH can be used as pseudo-replicates to improve the analysis of high-throughput regulome profiling data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association