Abstract:
|
Electronic health records (EHRs) are a rich source of data for clinical research, if the information can be extracted accurately and efficiently. Medication information can be critical for research on drug exposure response relationship to develop strategies to improve patient treatment. We describe a natural language processing (NLP) system developed using R called medExtractR to extract medication information such as strength, dose amount and frequency from clinical notes. We evaluated medExtractR using tacrolimus and lamotrigine, whose prescribing patterns ranged from simple to highly complex, and compared with three existing NLP systems: MedEx, MedXN, and CLAMP. MedExtractR achieved high recall/precision/F1 for each drug on 60 training notes (tacrolimus: 1.00/.991/.996, lamotrigine: .975/.996/.986) and 50 test notes (tacrolimus: .992/.972/.982, lamotrigine: .967/.983/.975). This outperformed all three existing NLP systems with respect to test set F1 for tacrolimus/lamotrigine (MedEx: .766/.888; MedXN: .943/.865; CLAMP: .743/.803). Our results suggest medExtractR as a better method for medication dose extraction, ultimately leading to higher quality EHR-based research datasets.
|