Abstract:
|
A series of recently developed chromatin conformation capture-based assays (3C, 4C, 5C and Hi-C) enabled the study of three-dimensional chromosomal structures and elucidated long-range genomic interactions among loci. However, current Hi-C analysis pipelines of these data discard reads aligning to multiple locations (multi-reads) and, hence, underestimate intra-chromosomal and inter-chromosomal interactions involving repetitive regions. We study this problem in depth and train a generative model to probabilistically allocate multi-reads contacts. This highly versatile model is extendable to a Bayesian hierarchical model by utilizing auxiliary protein-DNA interaction data as prior to improve the allocation accuracy. Our results suggest that effective utilization of multi-reads, on average, increases sequencing depth by 21% and, hence, identifies up to 10% more significant long-range interactions even under a conservative setting. Most of the new contacts are uniquely detected when multi-reads are employed and they originate from heterochromatin regions. Further analysis of these new interactions highlight the importance of long-range interactions originating from repetitive regions.
|