Abstract:
|
Accurate and reliable identification of sequence variants, including Single Nucleotide Polymorphisms(SNPs) and Insertion-Deletion polymorphisms(INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage spatial dependence of genotypes at nearby loci caused by Linkage Disequilibrium. We propose vi-HMM, a hidden Markov model(HMM) based method for calling SNPs and INDELs in aligned short read data. This method allows transitions between hidden states(defined as SNP, insertion, deletion, and match) on adjacent genomic bases, and determines an optimal hidden state path using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation experiments show that, under various sequencing depths, vi-HMM outperforms existing commonly-used variant calling methods in terms of sensitivity and F1 score. When applied to the human whole genome sequencing(WGS) data, vi-HMM achieves comparable results to the gold standard GATK callers and performs better than FreeBayes and Platypus.
|