Abstract:
|
Detecting somatic mutations at very low allele fraction is a challenging problem in cancer sequencing, mainly due to limited read depths and comparable level sequencing noise. Targeted enrichment allows researchers to sequence specific genes of interest with ultra-deep read coverage. Molecular barcoding technology enables analytical approaches to reduce the impact of enrichment and sequencing errors so that very low allele fraction variants can be reliably detected. We propose a molecular barcode-aware variant caller with the goal to accurately detect somatic SNPs and short insertion/deletions at allele fraction less than 1%. The core algorithm is based on a Bayesian probabilistic model that estimates the posterior probability of each nucleotide at a specific locus. These probabilities are then used to decide whether to call a candidate mutation. In addition, the variant caller applies several filters to further reduce false positive rate. The variant caller was trained on over 7,000 low allele fraction variants from NA12878, mostly at ~1% mixed in a background of NA23485, demonstrating very good sensitivity and specificity, and will be validated in an independent variant set.
|