|
Activity Number:
|
219
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Monday, August 3, 2009 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Statistical Computing
|
| Abstract - #304065 |
|
Title:
|
Distribution of Spaced Seed Statistics through Minimal Markov Chain Embedding
|
|
Author(s):
|
KeTrena S. Phipps*+ and Donald E.K. Martin
|
|
Companies:
|
North Carolina State University and North Carolina State University
|
|
Address:
|
Campus Box 8203, Raleigh, NC, 27695-8203,
|
|
Keywords:
|
Auxiliary Markov chain ; spaced seed ; minimal deterministic finite automaton ; success runs ; seed coverage
|
|
Abstract:
|
Spaced seeds are used to improve sensitivity of alignment searches without simultaneously increasing the number of random hits. In this paper, a method is given for computing the distribution of the number of sequence locations covered by seed hits and the distribution of the number of seed hits in Markovian sequences with a general order of dependence. Knowledge of these distributions provides statistical thresholds for distinguishing homologous regions from those occurring by chance. An auxiliary Markov chain is used to obtain recursive equations that make the computations simpler. Minimal deterministic finite automata are used in setting up the state space of the auxiliary chain to help reduce its size. Extending results for seed coverage from independent trials to higher-order Markovian trials allows greater modeling flexibility.
|