Online Program Home
My Program

Abstract Details

Activity Number: 331 - Statistical and Practical Issues for Reproducible Molecular Prediction in Biomedical Studies
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: ENAR
Abstract #329124
Title: Simple Bootstrap and Simulation Approaches to Quantifying Reliability of High-Dimensional Feature Selection
Author(s): Frank Harrell*
Companies: Vanderbilt University, Dept of Biostatistics
Keywords: high-dimensional data; variable selection; teaching; bootstrap; simulation; sample size
Abstract:

Feature selection in the large p non-large n case is known to be unreliable, but most biomedical researchers are not aware of the magnitude of the problem. They assume for example that setting a false discovery rate makes the results reliable, forgetting about the false negative rate and decades of research showing unreliability of stepwise variable selection even in the low p case. A related problem is the unreliability in the estimate of the effect (e.g., an odds ratio) of a feature found by selecting "winners". This talk will demonstrate some simple bootstrap and Monte Carlo simulation procedures for teaching biomedical researchers how to quantify these problems. One of the bootstrap examples exposes the difficulty of the task by computing confidence intervals for importance rankings of features.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program