Abstract:
|
Mortality for severe ARDS patients is high. Identifying risk factors associated with hospital mortality and, the directionality of these associations may help inform new basic science and clinical studies. However, these studies were usually done through purely hypothesis-driven variable selection from a hypothesis-constrained dataset using traditional statistical methods. Electronic Health Records have large data that may lead to discovery of novel risk factors for ARDS mortality, but are unwieldy to analyze using these methods. We leveraged machine learning techniques to narrow candidate variables associated with mortality through variable importance ranking. These techniques included random forests, support vector machine, gradient boosting, Lasso and Ridge regression. Variables ranked top up to 25 percent on average were included in subsequent analysis using logistic regression. A total of 107 variables for 246 patients were extracted from EHR. Five risk factors were identified to be statistically significant associated with hospital mortality and the directionality was determined. This data-driven methodology allows for new discoveries from the entire EHR data for further research.
|