|
Activity Number:
|
485
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Thursday, August 7, 2008 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Biometrics Section
|
| Abstract - #301144 |
|
Title:
|
Selecting Representative Trees in Random Forest for Survival Data
|
|
Author(s):
|
Mousumi Banerjee and Ying Ding and Anne-Michelle Noone*+
|
|
Companies:
|
The University of Michigan and The University of Michigan and Georgetown University
|
|
Address:
|
Lombardi Comprehensive Cancer Center, Washington, DC, 20057-1484,
|
|
Keywords:
|
tree-based methods ; survival data ; random forest ; out-of-bag error ; similarity metric
|
|
Abstract:
|
Tree-based methods are popular tools for prognostic stratification. Ensemble techniques such as random forest improve accuracy in prediction and address instability in a single tree. However, individual trees are lost in the forest. In this paper, we propose a methodology for selecting the most representative trees in a forest for survival data, based on three tree similarity metrics. For any two trees, the metrics are chosen to measure similarity of the covariates used to split the trees; reflect similar clustering of patients in the terminal nodes; and measure similarity in predictions. The most representative trees in the forest are chosen based on the average similarity score assigned to each tree. Out of bag estimates of error are computed for the most representative trees using a neighborhood of similar trees. Finally we illustrate the methods using a breast cancer data set.
|