Online Program

Using Zero-Inflated Count Models to Predict Software Defects in NASA Data

Connie Stewart, University of New Brunswick Saint John
*Tasneem Zaihra, Concordia University

Keywords: Poisson Distribution, Negative Binomial Distribution, Zero-Inflated Count Models, Regression, Software Reliability

Real-life data sets involving counts often have more zeros than would be expected under standard distributional assumptions. Although Poisson regression is regularly used to model count data in the regression context, when large frequencies of zeros are observed, this model does not fit the data well. Furthermore, count data are often characterized by overdispersion and alternative models such as the Negative Binomial (NB) model need to be considered. The NB is often not sufficient, however, when the response variable includes a large number of zeros and we typically resort to zero-inflated count models. In this poster we examine the predictive quality of the Zero-Inflated Poisson (ZIP) and Zero-Inflated NB models (ZINB), along with their counterpart count models, on data collected from NASA’s Metrics Data Program repository. The response variable, the number of defects in software models, contains a large number of zeros and exhibits overdispersion in the non-zero counts suggesting that the ZINB should be most suitable. We will discuss our findings including measures that support the use of the ZINB as well as potential limitations with this model.