Abstract:
|
One of the key challenges in metagenomic forensics is to establish a microbial fingerprint that can help in the prediction of the geographical origin of the metagenomic samples. Understanding a combination of different aspects such as microbiome sample sources, sequencing technologies, bioinformatics processing, statistical and machine learning methods, play a vital role in a comprehensive analysis of metagenomic data. We demonstrate the analysis of metagenomic samples from 23 different cities around the world to construct classification models that can be utilized to predict the geographical location for unknown samples obtained from these regions. We also describe the bioinformatics pre-processing of the raw sequencing data and estimate the abundance profiles of microbes in the samples using multiple tools. For the prediction of the geographical origin of samples, we trained and evaluated a variety of statistical classifiers, including an adaptive optimal ensemble classifier that performs well irrespective of what bioinformatics pipeline was used for data pre-processing.
|