Pathoscope 2.0: Statistical and Computational Methods for Accurate Characterization of Microbes in Sequencing Samples
*Solaiappan Manimaran, Boston University
Keywords: Clinical Metagenomics, Pathogen detection, Next-gen sequencing analysis software, Bayesian modeling
The rapid identification and quantification of pathogens present in a clinical sample is of high importance in controlling contagious diseases during an outbreak. For example, during the European E.coli outbreak of 2011, there was a three-week delay in the correct identification of the pathogen strain O104:H4, which caused 3,800 infections and 54 deaths. Here, we present Pathoscope 2.0, a complete software package for rapidly identifying and quantifying the microbial strains present in environmental or clinical sequencing samples. Pathoscope uses a Bayesian statistical methodology based on a penalized mixture modeling approach to accurately identify and quantify the pathogens. We also present a confidence region for the identified pathogens so that accurate diagnosis and the best possible treatment can be provided. We simulated sequencing reads from 25 strains of bacteria commonly found in humans. Our method was able to accurately identify and quantify the pathogen strain both in pure samples with single strain and in mixture samples with multiple strains of bacteria. Our method performed well, even with low-read coverage and in samples with multiple closely related strains.