Abstract:
|
EHR-based phenotyping infers whether a patient has a disease based on the information in their electronic health records. A human annotated training set with disease status labels is usually required to build a classification algorithm. The time intensiveness of annotation as well as feature curation severely limits the ability to achieve high-throughput phenotyping. Previous studies have successfully automated feature curation. In this talk, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. PheNorm transforms predictive features, such as the number of ICD-9 codes or mentions of the target phenotype, to resemble a normal mixture distribution. The transformed features are then denoised and combined into a score for accurate classification. We validated the accuracy of PheNorm with four phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUC of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the four phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference.
|