Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 158 - Statistical Methods for High-Dimensional Data in Health Care and Medical Research
Type: Topic-Contributed
Date/Time: Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistics in Epidemiology
Abstract #317501
Title: Empirical Evaluation of Internal Validation Methods for Estimating Optimism Error in High-Dimensional Electronic Health Record Data with Rare Event Outcomes
Author(s): Rebecca Yates Coley* and Qinqing Liao
Companies: Kaiser Permanente Washington Health Research Institute and University of Washington
Keywords: Prediction; Machine Learning; EHR data; Random forest; Suicide ; Bootstrap

Accurate estimates of internal validity are important to correctly guide both model selection and decisions about whether and how a prediction model should be used in clinical practice. Split-sample validation, in which the entire available sample is randomly divided into subsets used exclusively for model estimation (“training”) or validation (“testing”), is commonly used for internal validation. But, using only a fraction of the available observations exclusively for training and the remainder exclusively for validation reduces the statistical power of both tasks. Internal validation methods have been proposed that use the entire available dataset for both model estimation and validation, including cross-validation and bootstrap optimism correction. Demonstrations of bootstrap optimism correction for continuous risk prediction have been limited to logistic regression models predicting relatively common events with a small number of predictors. In this presentation, we compare internal validation methods for a random forest prediction model estimating a very rare event, suicide risk following an outpatient mental health visit.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program