Abstract:
|
Identifying differentially abundant features between different experimental conditions is a common goal for many metabolomics and proteomics studies. However, analyzing metabolomics and proteomics data from mass spectrometry is challenging because the data may not be normally distributed and contain a large fraction of zero values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We propose a new semi-parametric differential abundance analysis method for metabolomics and proteomics data from mass spectrometry. The method considers a two-part model, a logistic regression for the zero proportion and a semi-parametric log-linear model for the non-zero values. Our method is free of distributional assumption and also allows for adjustment of covariates. We propose a kernel-smoothed likelihood method to estimate regression coefficients in the two-part model and construct a likelihood ratio test for differential abundant analysis. Simulations and real data analyses demonstrate that our method outperforms existing methods.
|