|
Activity Number:
|
209
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Monday, July 30, 2007 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Statistics in Epidemiology
|
| Abstract - #310406 |
|
Title:
|
An Empirical Evaluation of the Random Forests Classifier Models for Variable Selection in a Large-Scale Lung Cancer Case Control Study
|
|
Author(s):
|
Qing Zhang*+ and Christopher I. Amos
|
|
Companies:
|
The University of Texas M.D. Anderson Cancer Center and The University of Texas M.D. Anderson Cancer Center
|
|
Address:
|
2815 Spring Lakes, Missouri City, TX, 77459,
|
|
Keywords:
|
Random Forests ; classification ; machine learning ; variable selection
|
|
Abstract:
|
Random Forests is a machine learning-based classification algorithm developed by Leo Breiman and Adele Cutler for complex data analysis. Previous research has indicated that it has excellent statistical properties when predictors are noisy and the number of variables is much larger than the number of observations. This study conducted an empirical evaluation of the method of Random Forests for variable selection using data from a large-scale lung cancer case-control study. A novel way of variable selection was proposed to automatically select prognostic factors without being adversely affected by multiple colinearities. This empirical study demonstrated that Random Forests can deal effectively and accurately with a large number of predictors simultaneously without overfitting.
|