eventscribe

The eventScribe Educational Program Planner system gives you access to information on sessions, special events, and the conference venue. Take a look at hotel maps to familiarize yourself with the venue, read biographies of our plenary speakers, and download handouts and resources for your sessions.

close this panel
support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

close this panel
←Back
‹‹ Go Back

Padraic G. Neville

SAS Institute



‹‹ Go Back

Pei-Yi Tan

SAS Institute



640 – A New Age of Data Mining in the High-Performance World

A Forest Measure of Variable Importance Resistant to Correlations

Sponsor: Section on Statistical Computing
Keywords: Random Forests, SAS, Variable Importance, Decision Tree

Padraic G. Neville

SAS Institute

Pei-Yi Tan

SAS Institute

Variable importance estimates that are output from decision trees and random forests are often used to reduce the dimension of data, especially in the presence of many variables, because decision trees can process many variables quickly. However, trees typically inflate the importance of correlated variables and even promote irrelevant correlated variables above predictive independent variables. Strobl et al. (2008) analyze the cause and propose a remedy. Unfortunately, the remedy is too complex to be practical for a large number of observations. This paper presents a simple method, called random branch assignments, which conforms to the analysis of Strobl et al. and yet can handle many observations. Although the method still incorrectly ranks the variables when the signal-tonoise ratio is less than 1, it is dramatically less sensitive to correlation effects than the measures of variable importance in the randomForest() function in R.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2014 CadmiumCD