JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 646
Type: Topic Contributed
Date/Time: Thursday, August 4, 2011 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #303019
Title: Flexible Clustering Models for Next-Generation Metagenomic Sequence Samples
Author(s): Karin S. Dorman*+ and Ranjan Maitra
Companies: Iowa State University and Iowa State University
Address: Department of Statistics, Ames, IA, 50011-1210, USA
Keywords:
Abstract:

Next generation metagenomic sequence samples consist of many short sequences sampled from multiple organisms. It is a classic clustering or classification problem, where the goal is to assign each sequence to one of a finite number of organisms. Clustering is relevant when the sample may include unexpected or unknown species. Classification assumes all possible organisms are known. Typically, sequence data is modeled, explicitly or effectively, as Markovian. As data become abundant, the models have grown in complexity. One such flexible and popular model is the interpolated Markov model (IMM), borrowed from the field of speech recognition. We formulate IMMs as a mixture over the complexity (generally order) of the Markovian dependence structure. Using mixtures over dependence structure avoids the heuristics of IMM paremeter estimation, and naturally allows model selection using any standard criterion or formal model comparison using bootstrap. Metagenomic samples are then modeled as two-dimensional mixtures over context length and species. Implementation for large samples and complex models is challenging, but results for simulation and small datasets are promising.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.