Abstract:
|
The integration of statistical and machine learning methods for the analysis of electronic health records (EHRs) is making it possible to more accurately predict diagnoses for patients. One way to do so is through word embedding, which represents words as vectors of real numbers while also capturing and preserving word relationships and semantic and syntactic similarities. There exists a wide variety of word embedding tools such as Word2Vec, BERT, fastText, USE, and GloVe, and there has been limited work on comparing their performance when it comes to using them on EHRs. We extend the word embedding tools to embed a patient’s entire medical history, and use the resultant embeddings to build prediction models for medical events. We assess performance in terms of predictive accuracy using the Medical Information Mart for Intensive Care (MIMIC) database.
|