Online Program Home
My Program

Abstract Details

Activity Number: 580
Type: Invited
Date/Time: Wednesday, August 3, 2016 : 2:00 PM to 3:50 PM
Sponsor: Biometrics Section
Abstract #318222
Title: COCACOLA: Binning Metagenomic Contigs Using Sequence COmposition, Read CoverAge, CO-Alignment, and Paired-End Read LinkAge
Author(s): Fengzhu Sun* and Yang Lu and Ting Chen and Jed Fuhrman
Companies: University of Southern California and University of Southern California and University of Southern California and University of Southern California
Keywords: next generation sequencing ; contig binning ; k-tuple ; non-negative matrix factorization ; regularization

The advent of next-generation sequencing (NGS) technologies enables researchers to sequence complex microbial communities directly from environment. Since assembly typically produces only genome fragments, also known as contigs, instead of entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based upon sequence composition and coverage across multiple samples using nonnegative matrix factorization with regularization. It also incorporates additional information such as co-alignment to the reference genomes and linkage of contigs provided by paired-end reads for contig binning. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison to state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association