Abstract:
|
Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set, which discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between samples. Recently, Zuo et al. developed a MBASIC (Matrix Based Analysis for State-space Inference and Clustering) framework to enable joint analysis of user-specified loci across multiple ChIP-seq datasets. Although this framework both estimates the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization based estimation structure hinders its applicability with large numbers of loci and samples. We address this limitation by developing a MAP-based Asymptotic Derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm which converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. This speed comes at a relatively insignificant loss in estimation accuracy.
|