Online Program Home
My Program

Abstract Details

Activity Number: 397 - Statistical Learning for Epigenomics Data
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2018 : 2:00 PM to 3:50 PM
Sponsor: SSC
Abstract #329123
Title: Inference of Transcription Factor Binding Sites in New Cell Types from Open Chromatin and Gene Expression Data
Author(s): Michael M. Hoffman* and Mehran Karimzadeh
Companies: Princess Margaret Cancer Centre/University of Toronto and University of Toronto
Keywords: epigenomics; machine learning; transcription factor binding sites; ChIP-seq; neural networks; gene expression

Transcription factors (TFs) bind DNA and control gene expression. Identifying TF binding sites is the first step in finding mutations that disrupt gene regulation and promote disease. ChIP-seq is the most common method for identifying them, but performing it on patient samples is hampered by the amount of available material. Existing methods for computational prediction primarily predict binding in genomic regions with known TF sequence preferences. But most binding sites don't resemble known TF sequence motifs, and many TFs are not sequence-specific.

We developed Virtual ChIP-seq, which predicts binding of individual TFs in new cell types using a neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq uses learned associations between gene expression and TF binding at specific genomic regions. We train Virtual ChIP-seq on a concatenated matrix of genomic regions and predictive features from training cell types and evaluate the performance on each of the validation cell types. Virtual ChIP-seq outperforms position weight matrix methods, predicting binding with MCC > 0.3 for 31 TFs.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program