Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Abstract Details

Activity Number:	611 - Applications in Business and Markets
Type:	Contributed
Date/Time:	Thursday, August 1, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #305305	Presentation
Title:	Customer Classification Using XGBoost: Accurate and Scalable Prediction of Customer Cluster Membership
Author(s):	Joseph Retzer* and Ewa Nowakowska
Companies:	ACT-MRSolutions and ey
Keywords:	xgboost; segmentation; gradient; boosting
Abstract:	High dimensional data analysis for predictive model development is both challenging and valuable. Various predictive models, e.g. CART, Random Forest analysis, bagging, neural networks, support vector machines, etc., have been shown to provide useful information, under various circumstances, for out-of-sample prediction. An alternative approach, known as stochastic gradient boosting (see Friedman 2000), has demonstrated remarkable results and is therefore often the preferred choice for predictive modeling. All afore mentioned methods however can be rendered ineffective when working with very large or high dimensional data. In other words, these methods tend to not ``scale' well and in addition ``over fit' when applied to data sets with many variables. In this paper we will employ ``XGBoost'(eXtreme Gradient Boosting), developed by Tianqi Chen and Carlos Guestrin of the University of Washington, for categorical response prediction. XGBoost provides a regularized, scalable and flexible, i.e. customizable and tunable, implementation of gradient boosting. This presentation will begin with a brief intuitive overview of ensemble-based boosting, culminating in its latest incarnation, extreme gradient boosting (XGBoost). This paper illustrates the ease of XGBoost’s implementation in R through its application to the prediction of customer segment membership. In order to insure XGBoost provides comparatively superior predictive performance, it is highly advisable to ``tune' the model through appropriate parameter value selection. This paper therefore also outlines optimal parameter selection for the XGBoost model using the R ``caret' package.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program

JSM 2019 Online Program

Abstract Details

American Statistical Association