Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 5 - Novel Methods for Microbiome Data Analysis
Type: Invited
Date/Time: Sunday, August 8, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #314468
Title: Deep Learning to Predict the Biosynthetic Gene Clusters in Bacterial Genomes
Author(s): Hongzhe Li*
Companies: University of Pennsylvania
Keywords: Microbiome; Metagenomics; Recurrent neural networks; Data augmentation; Protein family ; Embedding
Abstract:

Biosynthetic gene clusters (BGCs) in bacterial genomes code for important small molecules and secondary metabolites. Based on the validated BGCs, similarity of protein family domains (Pfam) and Pfam functions, we develop a deep learning method, BIGclass, for detectign the BGCs and their classes. We show that BIGclass leads to reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGCs compared to existing methods. We apply BIGclass to 5,666 RefSeq bacterial genomes and predicted a total of 170,685 BGCs from these genomes. Each genome, on average, has 30.1 predicted BGCs, ranging from 0 to 243. We summarize all the predicted BGCs, their functional classes and the distributions of the BGCs in different bacterial phyla. Applications of the BGCs in disease studies will be presented.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program