The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Online Program Home
Abstract Details
Activity Number:
|
224
|
Type:
|
Topic Contributed
|
Date/Time:
|
Monday, July 30, 2012 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract - #306612 |
Title:
|
Efficient Separation of Metagenomic Short Reads Into Genomes via Multistage Clustering
|
Author(s):
|
Olga Tanaseichuk*+ and Tao Jiang
|
Companies:
|
University of California at Riverside and University of California at Riverside
|
Address:
|
362 Engineering II Building, Riverside, CA, 92521, United States
|
Keywords:
|
metagenomics ;
NGS short reads ;
genome separation ;
clustering
|
Abstract:
|
Metagenomic sequencing results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed NGS technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the l-mers belong to unique genomes when l is sufficiently large. The first phase of the algorithm results in clusters of l-mers each of which belongs to one genome. During the second phase, clusters are merged based on l-mer repeat information. These final clusters are used to assign reads. Our tests on a large number of simulated metagenomic datasets concerning species at various phylogenetic distances demonstrate that genomes can be separated if the number of common repeats is smaller than the number of genome-specific repeats. For such genomes, our method can separate NGS reads with a high precision and sensitivity.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2012 program
|
2012 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.