Abstract:
|
With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of the WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number. We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining copy number variants were selected to control the false positive error rate. We presented examples of 13 cancer cell lines with known copy number variants on genes MET (7q31), EGFR (7p12) or ERBB2 (17q12). CNV results called by this algorithm correlated significantly with copy number detection using digital droplet PCR in 13 cell lines.
|