Abstract:
|
Copy number variation (CNV) analysis requires accurate and efficient methods to detect and classify CNVs. Many statistical algorithms have been developed with a strong assumption of independent observations in the genetic loci, and they assume each locus has an equal chance to be a breakpoint. However, this assumption is violated due to the existence of correlation among genomic positions such as linkage disequilibrium (LD). Moreover, our study also showed that the LD structure is related to the location distribution of CNVs which indeed presents a non-random pattern on the genome. Here, we proposed a novel algorithm, LDcnv, which integrated the genomic correlation structure with a local search strategy into statistical modelling of the CNV intensities. We theoretically demonstrated the correlation structure of CNV data in SNP array, which further supported the necessity of integrating genomic correlation structure. To evaluate the performance of LDcnv, we conducted extensive simulations and analyzed large-scale HapMap datasets. We showed that LDcnv presented high accuracy and robustness in CNV detection and higher precision in detecting short CNVs compared to existing methods.
|