Abstract:
|
Deep neural network classification models have been increasingly used to analyze large-scale electronic health records data and shown superior prediction performances. In general, the success of these models relies on the accessibility of a large number of labeled training data. In many healthcare settings, however, only a small number of accurately labeled data is available while unlabeled data is abundant. Further, input variables such as laboratory tests and charted events in the medical setting are usually sequential or longitudinal in nature, which poses additional challenges. In this project we propose new semi-supervised sequence learning methods, using deep generative models, to leverage both labeled and unlabeled data. We apply these methods to 5 mortality-related binary classification problems on a benchmark dataset extracted from the public MIMIC III database, and demonstrate that the proposed semi-supervised learning methods outperform supervised methods that use labeled data only.
|