Online Program Home
  My Program

Abstract Details

Activity Number: 460 - Clustering Methods for Big Data Problems
Type: Topic Contributed
Date/Time: Wednesday, August 2, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #324772
Title: Clustering Errored Sequence Reads to Estimate Unique Amplicons and Abundance
Author(s): Karin Dorman* and Xiyu Peng
Companies: Iowa State University and Iowa State University
Keywords: next generation sequencing ; bioinformatics ; clustering
Abstract:

Sequencing homologous genomic regions of samples of highly variable organisms or mixtures of organisms is becoming routine biological practice. The methodology is used to monitor the health or status of living systems from infected individuals to fermentation vats. Though the throughput of modern sequencing technology has enabled the possibility of such data, the technology also introduces an excess of errors, making it difficult to distinguish true biological variation from technical artifact. In many applications, identification of minor biological variation is critical. Minor variants often signal important biological shifts, such as mounting resistance, imminent community collapse, disease onset, and other important conditions that warrant monitoring in the first place. We describe methods to cluster next generation sequencing (NGS) reads while accounting for error properties of the NGS machine. The approach relies on clustering methods for big data and leads to better separation of true variation from errors than existing methods. We demonstrate the approach on mock bacterial communities and samples taken from HIV-infected patients.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association