Abstract:
|
Medical Concept Embeddings (MBE) has emerged in the literature as a feature reduction technique to compress the sparse space of healthcare diagnosis codes into a smaller subset of features. Originating in the natural language processing (NLP) domain, this technique relies on a neural network to learn associations in a large corpus—in this case, the co-occurrence of diagnosis codes in healthcare claims data. In this work, we introduce a novel application of MCEs using ICD-10 diagnosis codes from a large dataset of healthcare claims. We illustrate our framework and methodology which includes testing word2vec and doc2vec with multiple evaluation criteria. We then demonstrate the intrinsic value of resulting embeddings along multiple evaluation criteria, including T-SNE plots, RAND indices, Normalized Mutual Information (NMI), and discuss the implications and use for deep learning models.
|