Abstract #300847


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300847
Activity Number: 283
Type: Contributed
Date/Time: Wednesday, August 14, 2002 : 8:30 AM to 10:20 AM
Sponsor: Section on Bayesian Stat. Sciences*
Abstract - #300847
Title: Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model
Author(s): Mayetri Gupta*+ and Jun Liu
Affiliation(s): Harvard University and Harvard University
Address: Science Center 701b, 1 Oxford St., Cambridge, Massachusetts, 02138, U.S.A.
Keywords: gene regulation ; Bayesian inference ; data augmentation ; MCMC ; Gene sequence data
Abstract:

Detection of unknown patterns from a randomly generated sequence of observations is a problem arising in fields ranging from signal processing to computational biology. An example that we focus on is the detection of short recurring patterns in DNA sequences, called motifs, that represent potential protein binding sites during gene regulatory processes. What makes this problem difficult is that these patterns can vary stochastically. We describe here a novel Bayesian data augmentation strategy for detecting such patterns based on a stochastic "dictionary'" model, under which conserved patterns and nucleotides (stochastic words) are assumed to be generated according to probabilistic rules. Our missing data approach addresses other related problems, such as finding patterns of unknown width and those having varying degrees of insertions and deletions. However, the flexibility of this model is accompanied by a high degree of computational complexity, which is tackled by means of recursion methods. Bayesian techniques are proposed for evaluating the statistical significance of found motifs, and results are illustrated by means of simulation studies and a real data example.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002