Abstract:
|
We propose a two-part Tweedie regression model for testing the association between features (e.g., gene expression, microbial abundances, and other omics data) and clinical and demographical covariates. The model includes a logistic regression component to model the presence/absence of a feature in the samples and a Tweedie regression component to model non-zero abundance or expression profiles. The Tweedie sub-model relies on the flexible Tweedie distribution that can capture a large dynamic range of statistical properties observed in multi-platform omics data such as heavy tails, sparsity, and over-dispersion, among others. We illustrate our method using comprehensive simulation benchmarking, and real data analyses. The proposed method is available as part of the open-source R package Tweedieverse https://github.com/himelmallick/Tweedieverse.
|