Name: 2021 Joint Statistical Meetings
Start: 2021-08-08T07:00:00+00:00
End: 2021-08-12

Online Program Home
My Program

All Times EDT

Abstract Details

Activity Number:	132 - SLDS CSpeed 1
Type:	Contributed
Date/Time:	Monday, August 9, 2021 : 1:30 PM to 3:20 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #318723
Title:	Efficient Semi-Supervised Deep Learning and Machine Learning NLP System to Extract Clinical Measurements from Polysomnogram Laboratory Reports
Author(s):	Ioannis Malagaris* and David En Shuo Hsu and Yong-fang Kuo
Companies:	University of Texas Medical Branch and University of Texas Medical Branch and University of Texas Medical Branch
Keywords:	NLP; Deep Learning; BERD; OCR; Unsupervised Learning; Machine Learning
Abstract:	Deep learning-based NLP techniques are commonly applied to EHR data to extract narrative information. Major drawbacks of these methods are the requirements for labor-intensive manual annotation and significant amount of computing resources to train models. To overcome these issues, we propose a semi-supervised system: unsupervised deep learning for pre-training and supervised machine learning with minimal use of data. We randomly selected 100 patients seen in 2014-2018 from UTMB pulmonary clinics, collected 1010 scanned polysomnogram laboratory reports from EHR (EpicCare), and applied LSTM-based Tessereact OCR engine (NeuralNetsInTesseract4.00, 2019) to obtain machine-readable text. Subsequently, we built an unsupervised system using publicly available pre-trained BERT (Bidirectional Encoder Representations from Transformers). Last, we used the output of BERT as embedding to train a random forest model. A sample of 50 reports were used in training. Evaluation on the 960 held-out reports showed 97.6% precision and 92.7% sensitivity. In conclusion, while our system used small sample size its performance was similar to the one achieved by time-consuming deep learning classifiers.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program

JSM 2021 Online Program

Abstract Details

American Statistical Association