Abstract:
|
The three-dimensional (3D) structure of the genome plays a crucial role in regulating gene expression. Chromatin conformation capture (Hi-C) technologies have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), the fundamental building blocks of the genome. Identifying such hierarchical structures is a critical step in understanding regulation within the 3D genome. We frame the problem of TAD detection in a spectral clustering framework. Our SpectralTAD R package, in contrast to other tools, has automatic parameter selection, robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TAD structure. Using simulated and real-life Hi-C data, we show that SpectralTAD outperforms rGMAP and TopDom, two state-of-the-art R-based TAD callers. TAD boundaries that are shared among multiple levels of the hierarchy were more enriched in relevant genomic annotations, e.g., CTCF binding sites, suggesting their higher biological importance. In summary, we present a simple, fast, and user-friendly R package for robust detection of TAD hierarchies supported by statistical and biological evidence.
|