Activity Number:
|
331
- Statistical and Practical Issues for Reproducible Molecular Prediction in Biomedical Studies
|
Type:
|
Topic Contributed
|
Date/Time:
|
Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
|
Sponsor:
|
ENAR
|
Abstract #329124
|
|
Title:
|
Simple Bootstrap and Simulation Approaches to Quantifying Reliability of High-Dimensional Feature Selection
|
Author(s):
|
Frank Harrell*
|
Companies:
|
Vanderbilt University, Dept of Biostatistics
|
Keywords:
|
high-dimensional data;
variable selection;
teaching;
bootstrap;
simulation;
sample size
|
Abstract:
|
Feature selection in the large p non-large n case is known to be unreliable, but most biomedical researchers are not aware of the magnitude of the problem. They assume for example that setting a false discovery rate makes the results reliable, forgetting about the false negative rate and decades of research showing unreliability of stepwise variable selection even in the low p case. A related problem is the unreliability in the estimate of the effect (e.g., an odds ratio) of a feature found by selecting "winners". This talk will demonstrate some simple bootstrap and Monte Carlo simulation procedures for teaching biomedical researchers how to quantify these problems. One of the bootstrap examples exposes the difficulty of the task by computing confidence intervals for importance rankings of features.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2018 program
|