Abstract:
|
Though NLP techniques have been applied in EHR studies, extracting information from images remains challenging. Fortunately, novel algorithms utilizing deep learning for text detection and optical character recognition (OCR) are recently available. To our knowledge, there is no research that combines them with recurrent neural network (RNN) for laboratory test results. We developed a data pipeline for sleep study interpretation reports to identify sleep apnea diagnoses, which is an increasing disease (Gelburd, 2018). We randomly selected 100 patients seen in 2014-2018 from UTMB pulmonary clinics, collected scanned reports from EHR (EpicCare), and applied LSTM-based Tessereact OCR engine (NeuralNetsInTesseract4.00, 2019) to obtain machine-readable text. We then trained an RNN NLP model to identify sleep apnea diagnosis and measurement values including, apnea hypopnea index and oxygen saturation. Validation by physician chart-review shows 100% sensitivity and 80% specificity of the proposed data pipeline. Future studies are needed to generalize this pipeline for other information, such as PLM arousal index, Cardiac arrhythmia.
|