Online Program

Return to main conference page
Thursday, February 15
PS1 Poster Session 1 and Opening Mixer Thu, Feb 15, 5:30 PM - 7:00 PM
Salons F-I

Perl-Compatible Regular Expressions as a Tool to Abstract Semi-Structured Electronic Health Records (303648)

View Presentation View Presentation

Bing Ho, Northwestern University 
*Samantha Emily Montag, Northwestern University 
Anton Skaro, The University of Western Ontario 
Lihui Zhao, Northwestern University 

Keywords: Data Cleaning, Perl-Compatible Regular Expressions, SAS

Perl-compatible regular expressions are readily available in a number of statistical packages including SAS and R. These tools are incredibly flexible however have limited use outside of text processing. Analysts often avoid free text fields in datasets because they are difficult to convert into discrete variables for analysis. However, electronic medical records contain many free text fields that clinicians would like to abstract discrete concepts from for research. Gold standard manual parsing can only be done for small datasets. Autonomous parsing free text presents many technical challenges. Though in many cases what is perceived to be free text is actually semi-structured data with some underlying pattern. These patterns can be used to extract discrete fields with regular expressions with a high degree of accuracy and at high speed. Furthermore, our text processing algorithm can flag difficult to parse text, greatly reducing amount of manual review. We will present an example algorithm with regular expressions in SAS to create discrete variables from semi-structured kidney biopsy reports and will be accessible to intermediate to advanced users.