Abstract:
|
Administrative claims data present a unique opportunity for longitudinal assessment of patients's health. We adapt topic models, widely used for text mining, to analyzing such data. In this work, we estimate an unobserved patient-specific trajectory that characterizes her progression of multiple latent biological aberrations, each of which is an unobserved topic that yields distinct content distributions of the diagnosis codes. We propose a novel extension of the structural topic model (Roberts et al. 2016) that builds in important features of claims data: repeated multivariate diagnosis codes, and time-varying covariates for topic prevalences and content distribution. Our model specifies the topic prevalences by logistic mixed models and the content distributions by regularized logistic models. We derive a scalable variational EM inference algorithm. We apply the model to data from 15k cancer-associated thrombosis patients extracted from OptumInsight claims database. By aggregating monthly diagnosis codes (ICD-9) over multiple months as correlated documents from a patient, we quantify the latent disease progression and the effects of baseline and time-varying covariates.
|