JSM 2002

Activity Number:	234
Type:	Invited
Date/Time:	Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
Sponsor:	IMS
Abstract - #301873
Title:	A Dictionary Model for Motif Analysis
Author(s):	Kenneth Lange*+
Affiliation(s):	University of California, Los Angeles
Address:	, Los Angeles, California, 90095-1766,
Keywords:
Abstract:	Bussemaker et al. (2000, PNAS) proposed the simple idea of modeling DNA non coding sequence as a concatenation of words and gave an algorithm to reconstruct deterministic words from an observed sequence. Moving from the same premises, we consider words that can be spelled in a variety of forms (hence accounting for varying degrees of conservation of the same motif across genome locations). The overall frequency of occurrence of each word in the sequence and the parameters describing the random spelling of words are estimated in a maximum-likelihood framework using an E-M gradient algorithm. We also describe a Markov Chain Monte Carlo method based on Gibbs Sampler. Once these parameters are estimated, it is possible to evaluate the probability with which each motif occurs at a given location in the sequence. These conditional probabilities can be used to monitor properties of genome sequences, such as neighboring occurrences of given motifs. Examples include simulated and real datasets.

	Abstract #301873
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2002 Program page

Abstract #301873