Abstract:

When using an auxiliary Markov chain (AMC) to compute sampling distributions, the computational complexity is directly related to the number of Markov chain states. For certain complex pattern statistics, minimal deterministic finite automata (DFA) have been used to facilitate efficient computation by reducing the number of AMC states to a minimal set sufficient for computing the distribution. However, it can be the case that forming a DFA before applying a minimization algorithm is computationally expensive. To deal with these situations, we give a characterization of equivalent states so that extra ones (and their offspring) are deleted during the process of state space formation. The method is illustrated on the problem of computing the distribution of the number of structured motifs, a model that is useful for locating binding sites in DNA sequences. The underlying sequence is modeled as a Markov chain.
