Abstract #302067

This is the preliminary program for the 2003 Joint Statistical Meetings in San Francisco, California. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 2-5, 2003); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2003 Program page



JSM 2003 Abstract #302067
Activity Number: 89
Type: Contributed
Date/Time: Monday, August 4, 2003 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract - #302067
Title: Using the Area Under the ROC Curve to Determine Discriminatory Power and Variable Importance of Random Forest Predictors
Author(s): Yunda Huang*+ and Steve Horvath
Companies: University of California, Los Angeles and University of California, Los Angeles
Address: 3155 Sepulveda Blvd., Los Angeles, CA, 90034-4220,
Keywords: ROC ; C-index ; randomforests ; discriminatory power ; multiclass ; variable importance
Abstract:

Random forest classifiers (Breiman 2001) have resulted in a significant increase of discriminatory power. Their construction leads to important byproducts such as out-of-bag estimates of the error rate and measures of variable importance. Here we propose to measure their discriminatory power by out-of-bag estimates of the C-index, which is a generalization of the area under the ROC curve (Harrel 1996). We also consider a generalization of the C-index to multiclass outcomes (Hand and Till 2001). We propose several new measures of variable importance based on the C-index. We apply our method to simulated data and benchmark data from the UCI machine learning repository. We find that the C-index is generally superior to the error rate. This is particularly true when dealing with asymmetric class sizes. Interestingly, Breiman's variable importance measures often work well even when the error rate of the classifier does not differ from that of a trivial(naive) classifier. Generally, Breiman's variable importance measures outperform importance measures based on the C-index even when the C-index is superior to the error rate for assessing the overall discriminatory power.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2003 program

JSM 2003 For information, contact meetings@amstat.org or phone (703) 684-1221. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2003