Abstract:
|
Single-cell Hi-C sequencing (scHi-C) technology allows us to understand chromatin organization dynamics and cell-to-cell heterogeneity. However, interpretation of scHi-C data exposes intrinsic data analysis challenges, such as the fact that Hi-C data are essential two-dimensional pairwise measures rather than one dimensional measures, and practical data analysis challenges, such as sparsity of contact maps, batch effect, and sequencing noise. In our scHiCTools, we implemented a faster version of HiCRep, together with another Hi-C similarity measure named Selfish, and a new inner product approach which provides a more efficient way of embedding scHi-C data. We demonstrated that the new inner product approach runs faster than original HiCRep, and produces comparably accurate projection. To deal with the sparsity, three smoothing approaches were implemented, including linear convolution, random walk, and network enhancing. Among the three, linear convolution appeared to be most effective for smoothing sparse datasets. Our open source toolbox, scHiCTools (https://github.com/liu-bioinfo-lab/scHiCTools), as the first toolbox of such kind, can be useful for analyzing scHi-C data.
|