The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Online Program Home
Abstract Details
Activity Number:
|
104
|
Type:
|
Invited
|
Date/Time:
|
Monday, July 30, 2012 : 8:30 AM to 10:20 AM
|
Sponsor:
|
General Methodology
|
Abstract - #303790 |
Title:
|
Predicting Travel Times for the M4 Highway in Sidney: How We Won Kaggle.com's First $10,000 Data Mining Challenge
|
Author(s):
|
José Pablo González-Brenes*+
|
Companies:
|
Carnegie Mellon University
|
Address:
|
5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
|
Keywords:
|
Forecasting travel time ;
Random Forest ;
Data mining competition
|
Abstract:
|
The New South Wales Roads and Traffic Authority collected over 2 years worth of historical data on road use between 2008 and 2010 and made it available for a data mining competition on kaggle.com. The challenge consisted of forecasting travel time on the M4 highway in different prediction horizons - ranging from a few minutes in the future, to up to 24 hours ahead. Contestants' submissions were evaluated on how accurately travel time was predicted on a held-out dataset. We describe how we built the statistical model that outperformed over 350 other teams.
We model highway travel time using a non-parametric algorithm called Ensemble of Decision Trees. This approach is able to efficiently capture non-linear interactions between regressors. We compare Ensemble of Decision Trees with auto-regressive models and describe some of the difficulties we encounter, such as the non-i.i.d nature of the held-out set.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2012 program
|
2012 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.