Activity Number:
|
234
|
Type:
|
Invited
|
Date/Time:
|
Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
|
Sponsor:
|
IMS
|
Abstract - #301873 |
Title:
|
A Dictionary Model for Motif Analysis
|
Author(s):
|
Kenneth Lange*+
|
Affiliation(s):
|
University of California, Los Angeles
|
Address:
|
, Los Angeles, California, 90095-1766,
|
Keywords:
|
|
Abstract:
|
Bussemaker et al. (2000, PNAS) proposed the simple idea of modeling DNA non coding sequence as a concatenation of words and gave an algorithm to reconstruct deterministic words from an observed sequence. Moving from the same premises, we consider words that can be spelled in a variety of forms (hence accounting for varying degrees of conservation of the same motif across genome locations). The overall frequency of occurrence of each word in the sequence and the parameters describing the random spelling of words are estimated in a maximum-likelihood framework using an E-M gradient algorithm. We also describe a Markov Chain Monte Carlo method based on Gibbs Sampler. Once these parameters are estimated, it is possible to evaluate the probability with which each motif occurs at a given location in the sequence. These conditional probabilities can be used to monitor properties of genome sequences, such as neighboring occurrences of given motifs. Examples include simulated and real datasets.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2002 program |