Abstract Details
Activity Number:
|
581
|
Type:
|
Invited
|
Date/Time:
|
Wednesday, August 12, 2015 : 2:00 PM to 3:50 PM
|
Sponsor:
|
International Chinese Statistical Association
|
Abstract #314666
|
|
Title:
|
Distributed Random Forests
|
Author(s):
|
Adam Bloniarz* and Bin Yu and Ameet Talwalkar
|
Companies:
|
UC Berkeley and UC Berkeley and UCLA
|
Keywords:
|
random forests ;
spark ;
hadoop ;
distributed algorithms ;
classification ;
regression
|
Abstract:
|
Random forests are among the most successful general-purpose methods for classification and regression tasks. Thus, it is desirable that they be able to scale to large datasets and maintain statistical efficiency. We present a novel distributed algorithm for random forests with a small communication footprint. Instead of focusing on parallelized tree construction, we leverage the interpretation of random forests as a potential nearest-neighbor type algorithm to design a divide-and-conquer algorithm. We demonstrate the accuracy and scalability of our algorithm on multiple real-world applications using Spark, a widely used open-source platform for distributed computation. Spark is an open-source implementation of MapReduce, similar to Hadoop but with increased flexibility for in-memory caching of intermediate results.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2015 program
|
For program information, contact the JSM Registration Department or phone (888) 231-3473.
For Professional Development information, contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
2015 JSM Online Program Home
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.