Online Program Home
My Program

Abstract Details

Activity Number: 701
Type: Contributed
Date/Time: Thursday, August 4, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistics in Marketing
Abstract #320166 View Presentation
Title: Scalable High-Performance Prediction with XGBoost
Author(s): Ewa Nowakowska* and Joseph Retzer
Companies: GfK and ACT Market Research Solutions
Keywords: extreme gradient boosting ; stochastic gradient boosting ; scalability
Abstract:

Modelling with high dimensional data is both challenging and valuable. Various predictive models, e.g. CART, Random Forest analysis, bagging, neural networks, support vector machines, etc., have been shown to provide useful models for out-of-sample prediction. An alternative approach, known as stochastic gradient boosting (Friedman, 2001 and Friedman et. al. 2000), has demonstrated remarkable results and is therefore often a preferred choice for predictive modeling. However, unlike e.g. Random Forests, which are well known for its scalability, stochastic gradient boosting has its limitations regarding the speed as well as data size it can effectively handle. In other words it does not "scale" well when applied to big data. In addressing this issue we employ the model, "XGBoost" (eXtreme Gradient Boosting), developed by Tianqi Chen and Carlos Guestrin, University of Washington. XGBoost provides an efficient and scalable implementation of gradient boosting. Our study seeks to predict on time arrival behavior of flights using data from RITA database. It will be shown that XGBoost not only provides comparatively high predictive performance but also insures scalability of the model.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association