Keywords: semicontinous data, two-part models, marginalized models, healthcare expenditures, log skew normal, generalized gamma
Semicontinuous data, characterized by a point mass at zero and a positive continuous distribution, arise frequently in medical research. These data are often analyzed using two-parts that separately model the probability of incurring a positive outcome and the distribution of values among those who do. However, because the second part conditions on a nonzero response, such two-part models do not directly provide a marginal interpretation of covariate effects on the overall population. We have previously proposed a marginalized two-part model that yields more interpretable estimates. Originally, a constant variance was assumed for the positive values. We now extend this by allowing non-constant variance to be modeled as a function of covariates, and incorporate this into two flexible distributions, log-skew-normal and generalized gamma, both of which take the commonly used log-normal distribution as a special case. Using simulation studies, we compare the performance of each of these model formulations with respect to bias, coverage, and efficiency. We illustrate the approach by evaluating the effect of a weight loss intervention on healthcare expenses in the VA health care system.