Data in real life are always messy. When observing Poisson counts in real life, there might be contamination from extra zero (zero-inflation) or from another distribution with far larger mean than the one we are interested. Zero-inflated Poisson model can handle the excess zero well. Robust GLM model is good at handling large contamination. However, there is no method that can handle both.
It appears that many genes' expression can be seen as a mixture of background and real expression. It is not a mixture of a point mass at zero and the Negative Binomial (NB) distribution, but a point mass at zero, a NB at very low levels, and a NB at a higher level. A natural way is to model the expression as a mixture of a zero-inflated NB, as background transcription, and a higher NB. Poisson distribution provides a good approximation when the mean of NB is small.
Here we propose a new robust Poisson estimator for the Poisson mean and the inflation factor to account for this situation and compare it with three other widely used Poisson mean estimators.
|