Abstract #301873


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #301873
Activity Number: 234
Type: Invited
Date/Time: Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract - #301873
Title: A Dictionary Model for Motif Analysis
Author(s): Kenneth Lange*+
Affiliation(s): University of California, Los Angeles
Address: , Los Angeles, California, 90095-1766,
Keywords:
Abstract:

Bussemaker et al. (2000, PNAS) proposed the simple idea of modeling DNA non coding sequence as a concatenation of words and gave an algorithm to reconstruct deterministic words from an observed sequence. Moving from the same premises, we consider words that can be spelled in a variety of forms (hence accounting for varying degrees of conservation of the same motif across genome locations). The overall frequency of occurrence of each word in the sequence and the parameters describing the random spelling of words are estimated in a maximum-likelihood framework using an E-M gradient algorithm. We also describe a Markov Chain Monte Carlo method based on Gibbs Sampler. Once these parameters are estimated, it is possible to evaluate the probability with which each motif occurs at a given location in the sequence. These conditional probabilities can be used to monitor properties of genome sequences, such as neighboring occurrences of given motifs. Examples include simulated and real datasets.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002