Friday, February 24
CS02 Beyond the Basics: Advanced Modeling Methods Fri, Feb 24, 9:15 AM - 10:45 AM
River Terrace 3

Improve Regression and Communicate Results Using Stochastic Gradient Boosting and LASSO (303378)

*Charles William Harrison, Salford Systems 

Keywords: Stochastic Gradient Boosting, LASSO, machine learning, rule ensembles, regularized regression, interpretations

The goal of this presentation is to demonstrate how Stochastic Gradient Boosting in combination with the LASSO can be used to build powerful regression models (as described in Friedman and Popescu, 2005) that can be meaningfully interpreted and presented to an audience. Regression is a popular method in statistical modeling in part because it can be interpreted and presented to a less technical audience. Issues that arise in the use of regression include missing values, modeling complex trends and local effects, and detecting interactions and non-linearities. Stochastic Gradient Boosting is a powerful machine learning technique that automatically handles these issues, but it can be challenging to interpret. The issue of interpretation can be mitigated by using the individual rule sets from a boosted tree model as predictors in a LASSO regression model while sacrificing little, if any, accuracy. The final model is relatively sparse and can be interpreted because each predictor is in the form of a rule (i.e. Age > 10 AND Income < 50,000 AND Gender=1).