Online Program Home
  My Program

Abstract Details

Activity Number: 131 - Predictive Modeling in Data Science
Type: Contributed
Date/Time: Monday, July 31, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323571 View Presentation
Title: Delayed Greedy Algorithm for Classification and Regression Trees
Author(s): Kyle Caudle* and Larry Pyeatt and Patrick Fleming
Companies: South Dakota School of Mines and Technology and South Dakota School of Mines and Technology and South Dakota School of Mines and Technology
Keywords: CART ; Greedy ; Machine Learning
Abstract:

Classification and Regression trees (CART) are non-linear prediction models dating back to Breiman's work in the 1980's. These models are often thought of as decision trees whereby the feature space is divided into a tree-like structure based on based on specific levels of the independent variables. The basic CART methodology cycles through all variables and levels of the variables until it finds a partition of the feature space that minimizes the total variability. CART is a greedy algorithm because it just looks for the split point that gives you the largest reduction in variability. Our approach is a delayed greedy approach in that we find the best split point for all variables, but we do not make a decision regarding which split to use until we have split twice. At such a time, we choose the first split point that leads to the largest reduction in variability after two splits which may not necessarily be the one that gives you the largest reduction after one split. This talk will outline the delayed methodology and provide instances where this delayed greedy approach outperforms the standard CART greedy approach.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association