Activity Number:
|
35
|
Type:
|
Contributed
|
Date/Time:
|
Sunday, July 31, 2016 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistics in Epidemiology
|
Abstract #320778
|
|
Title:
|
Automated Feature Selection for Prediction with Electronic Medical Records Data
|
Author(s):
|
Jessica Minnier* and Sheng Yu and Katherine Liao and Tianxi Cai
|
Companies:
|
Oregon Health & Science University and Tsinghua University and Brigham and Women's Hospital and Harvard
|
Keywords:
|
electronic medical records ;
prediction ;
phenotyping ;
surrogate outcome ;
variable selection ;
medical informatics
|
Abstract:
|
The use of electronic medical records (EMR) for research is challenging due to imprecise coding practices and free form text fields. Natural language processing (NLP) methods can extract features from text but selecting informative features is not trivial. Furthermore, imprecise billing codes can lead to mismeasurement of disease outcomes. Often experts must manually review a subset of records to obtain a gold standard phenotype label. Models built on this data have limited prediction accuracy due to a high dimension of predictors and small sample size. We present an automated feature selection method that utilizes model-based clustering and regularized regression to build a prediction model with surrogate outcomes from EMR data, such as diagnosis codes and mentions of disease in text fields. Our method performs variable selection of NLP features and maintains high prediction accuracy even when labeled training data are unavailable. Our automated feature selection method minimizes the requirement of gold standard labels for algorithm training, thereby improving automated prediction and phenotyping efficiency.
|
Authors who are presenting talks have a * after their name.