Abstract:
|
Missingness in categorical data is a common problem in various real applications. Traditional approaches like available-case analysis often waste much data and may bias the inference. In this paper, we propose the Dirichlet Process Mixture of Collapsed Multinomial (DPMCM) model for incomplete categorical data. DPMCM provides a tool to model full data jointly by fitting an infinite mixture of multinomial distributions. With a mixture approximation to the underlying joint distribution, DPMCM is flexible for any categorical data regardless of true joint distribution. Under the framework of latent class analysis, DPMCM allows for general missing mechanisms by creating an extra category to denote missingness. Through simulation studies and a real application, we demonstrate that DPMCM could perform better statistical inference and imputation than existing approaches.
|