Abstract:
|
Thanks to next-generation sequencing technologies, researchers can access millions of DNA sequences from a single experiment. A major area of interest is the analysis of the human microbiome, the total genome of the microorganisms living in the human body. The human body contains about 10^13 human cells and 10^14 bacterial cells, so the microbiome is often perceived as an extended human genome. Different parts of the body have distinct bacterial compositions, closely related to the presence of diseases, e.g. inflammatory bowel disease, peripheral vascular disease, asthma and hypertension. Developing statistical procedures for comparison and identification of microbial diversity can be essential in the early diagnosis or curing of many diseases. We develop a two-sample test statistic to test the hypothesis that the overall microbial compositions of two samples are different.We derive its theoretical properties and asymptotic F-distribution. None of the existing methods establish exact distributions and rely only on permutation algorithms (PERMANOVA). We demonstrate through simulations and real data example its superiority of type 1 error and computational efficiency.
|