Abstract:
|
Associating bacteria with disease has predominately been performed using targeted sequencing. While targeted sequencing is cheaper, whole metagenome shotgun sequencing (WMS) can generate orthogonal information by providing bacterial gene content. WMS studies can be used to discover infectious bacterial pathogens or pathogenic genes in various diseases, including Type II diabetes. Key differences between targeted and WMS data include how features are defined. However, just as when using targeted sequencing, appropriate statistical methods are needed to account for the unique characteristics of metagenomic datasets like sparsity. We present an overview of several different feature generation definitions, the impact on sparsity and the count distribution including how sequencing insensitivity and undersampling result in fewer detected genes, thereby potentially biasing differential abundance estimates. In addition we analyze the effect sparsity has on normalization scaling methods as well as common differential abundance statistics. We find that zero-inflated Gaussian mixture models developed for targeted sequencing are also appropriate.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.