Abstract:
|
The growth of metagenomic data generation has driven the development of novel statistical methods for differential taxonomic abundance, but interpretation of these findings remains challenging. We present a new manually curated database of >1,600 published microbial signatures from >400 differential abundance studies of the human microbiome, spanning geography, health outcomes, and host body sites, along with controlled metadata on study design and statistical methods. This database enables the development and benchmarking of methods akin to Gene Set Enrichment Analysis (GSEA) for human microbiome studies, while identifying statistical challenges arising from a hierarchical taxonomy, small set sizes relative to most gene sets, body site-specific backgrounds, and high microbial diversity among both healthy and unhealthy subjects. This talk provides an overview of the database and its application to benchmarking methods of Microbe Set Enrichment Analysis (MSEA), and identifies statistical challenges still to be met.
|