Abstract:
|
Recently the study of three-dimensional arrangement of chromatin has gained a lot of attention both in genome research as well as computer science and statistics. This 3D structure and looping interaction of the chromatin influences an array of important cellular processes like cell replication, differentiation and gene expression. While it is not possible to completely observe the 3D conformation, recently the Hi-C technology has enabled us to indirectly measure the looping interactions of the DNA. The goal of these studies is to accurately identify densely interacting regions of the chromosome, also known as topologically associating domains or TADs. This structure of the data naturally inspires application of community detection methods to Hi-C data. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted. However, this is a typical situation where the nodes, i.e. the positions on the chromatin, are not exchangeable. We propose a network model for Hi-C data and derive an computationally efficient linear program to do inference under this model. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability.
|