Abstract #301016

This is the preliminary program for the 2003 Joint Statistical Meetings in San Francisco, California. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 2-5, 2003); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2003 Program page



JSM 2003 Abstract #301016
Activity Number: 89
Type: Contributed
Date/Time: Monday, August 4, 2003 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract - #301016
Title: On Strength and Correlation in Random Forests
Author(s): Samuel E. Buttrey*+ and Izumi Kobayashi
Companies: Naval Postgraduate School and
Address: Dept. of Operations Research, Monterey, CA, 93943,
Keywords: random forest ; tree ensembles ; classification trees
Abstract:

A random forest is an ensemble of tree classifiers for a particular problem in which each tree is generated using some randomization of the data or of the tree's splitting criteria. Bagging and random splits are two examples of methods of generating random forests. The individual trees in a forest will often be weak, but the ensemble's predictions, derived by a vote, can be highly accurate. Breiman (2001) defines "strength" and "correlation" for random forests and constructs an upper bound on an ensemble's prediction error in terms of these quantities. The strength of a random forest describes the ensemble's average prediction quality; the correlation describes the extent to which predictions are similar from one member of the ensemble to another. We compute estimates of strength and correlation for some twenty combinations of forest algorithm and parameter settings, using thirteen well-known data sets. In many cases the plot of strength versus correlation is a smooth curve, suggesting a strong relationship between the two measures and a threshold which bounds the two. This research may help practitioners choose parameter values or forest-building methods.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2003 program

JSM 2003 For information, contact meetings@amstat.org or phone (703) 684-1221. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2003