JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 104
Type: Invited
Date/Time: Monday, July 30, 2012 : 8:30 AM to 10:20 AM
Sponsor: General Methodology
Abstract - #303790
Title: Predicting Travel Times for the M4 Highway in Sidney: How We Won Kaggle.com's First $10,000 Data Mining Challenge
Author(s): José Pablo González-Brenes*+
Companies: Carnegie Mellon University
Address: 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Keywords: Forecasting travel time ; Random Forest ; Data mining competition
Abstract:

The New South Wales Roads and Traffic Authority collected over 2 years worth of historical data on road use between 2008 and 2010 and made it available for a data mining competition on kaggle.com. The challenge consisted of forecasting travel time on the M4 highway in different prediction horizons - ranging from a few minutes in the future, to up to 24 hours ahead. Contestants' submissions were evaluated on how accurately travel time was predicted on a held-out dataset. We describe how we built the statistical model that outperformed over 350 other teams.

We model highway travel time using a non-parametric algorithm called Ensemble of Decision Trees. This approach is able to efficiently capture non-linear interactions between regressors. We compare Ensemble of Decision Trees with auto-regressive models and describe some of the difficulties we encounter, such as the non-i.i.d nature of the held-out set.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.