Abstract:
|
International Classification of Disease (ICD) codes are widely used for encoding diagnoses in electronic health records (EHR). An ICD code contains information about the diagnosis, and a collection of ICD codes defines a chronic condition. Automated methods have been developed over the years for predicting a variety of biomedical responses using the EHR, which borrow information among demographically and diagnostically similar patients. Relatively less attention has been paid to developing patient similarity measures that model the structure of ICD codes and the presence of multiple chronic conditions in addition to their primary diagnosis. Motivated by this problem, we first develop a type of string kernel function for defining similarity between a pair of subsets of ICD codes, which simultaneously uses the information about the diagnoses and related chronic conditions. Second, we extend this similarity measure to define a family of covariance functions on diagnoses encoded as subsets of ICD codes. Using a member of this family, we develop Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses in the form of ICD codes as covariates.
|