Online Program

Bayesian Nonparametric Rating Scale Model for Health Outcomes Measurement

*Ken Akira Fujimoto, University of Illinois-Chicago  
George Karabatsos, University of Illinois-Chicago  

Keywords: Bayesian Nonparametric IRT, Infinite Mixture Model, Rating Scale Analysis, Dependent Dirichlet Process

Item response theory (IRT) models are often used to analyze data arising from rating scale questionnaires that measure health outcomes. These models include parameters of patient ability, item difficulty, and threshold parameters for the rating categories. Commonly-used IRT rating scale models assume that the rating category threshold parameters are the same over patients. This assumption can easily be violated when there is differential item functioning (DIF) (e.g., item bias) due to known or latent subgroups of patients in the data sample. Also, empirical violations of this assumption can lead to misleading health outcome measurements from the given IRT model. To address this practical psychometric problem, we introduce a novel, Bayesian nonparametric IRT model for rating scale analysis. The model is an infinite-mixture of Rasch partial credit models, with the mixture based on a localized Dependent Dirichlet process (DDP). The model treats the rating thresholds as the random parameters that are subject to the mixture, and has (stick-breaking) mixture weights that are covariate-dependent. Thus, the novel model allows the rating category thresholds to vary flexibly across items and examinees, and allows the distribution of the category thresholds to flexibly vary as a function of covariates. This added flexibility allows for the detection of DIF across known and/or latent subgroups of patients in the sample, and therefore helps control for the presence of DIF for the purposes of health outcomes measurements. We illustrate the new model through the analysis of a simulated data, and through the analysis of a real rating data set arising from a questionnaire that assesses the fear of falling among geriatric patients. The model is shown to have better data fit, when compared to the other commonly-used IRT rating models.