Abstract:
|
Introduction: Studies of abundance and diversity of microbial species in the gut microbiome have found novel associations with disease. Some species occur very rarely in cases or controls and may reflect a marker of disease, limited ingestion or signal noise. The challenge of the zeroes needs to be addressed correctly. Methods: Using metagenomic shotgun sequencing of fecal DNA from 97 cases of non-alcoholic fatty liver disease from the NIH NASH Clinical Research Network, we compare methods which allow for zero-inflation, including mixture models, negative binomials and filtering approaches which exclude species and their aggregates with low abundance. We investigate their effect on differential abundance and diversity indices. Results: For highly zero-inflated data, in general mixture models were the best starting point in highlighting species and aggregates for further examination. Some abundances with different, low overall counts may indicate signal noise. High abundance in a few subjects may reflect ingestion only. Elimination of "doubtful species" can improve diversity estimates. Summary: An interactive multistage process is needed to avoid errors due to zero-inflated counts.
|