Abstract:
|
Efficient operation of online display advertising campaigns relies on accurate prediction of the likelihood that a web user clicks on a particular banner on a particular site. The problem, which is one of canonical examples of the application of machine learning and statistics to the "big data", is essentially a classification problem with many predictor variables, where petabytes of input data are processed in a Hadoop pipeline. Commonly applied methods include lasso, random forest, and gradient boosting, implemented on such computing platforms as R, Vowpal Wabbit, and native implementation. Among these, Vowpal Wabbit, an out-of-core learning system, has become a popular choice in many large scale machine learning problems. In this paper, we compare the performance of a few different algorithms and platform combinations. In particular, we compare the performance of algorithms implemented in R and Vowpal Wabbit in various sample size and feature number combinations, and their interactions with sampling, feature selection, and feature transformation.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.