JSM 2015 Preliminary Program

Online Program Home
My Program

Abstract Details

Activity Number: 581
Type: Invited
Date/Time: Wednesday, August 12, 2015 : 2:00 PM to 3:50 PM
Sponsor: International Chinese Statistical Association
Abstract #314666
Title: Distributed Random Forests
Author(s): Adam Bloniarz* and Bin Yu and Ameet Talwalkar
Companies: UC Berkeley and UC Berkeley and UCLA
Keywords: random forests ; spark ; hadoop ; distributed algorithms ; classification ; regression
Abstract:

Random forests are among the most successful general-purpose methods for classification and regression tasks. Thus, it is desirable that they be able to scale to large datasets and maintain statistical efficiency. We present a novel distributed algorithm for random forests with a small communication footprint. Instead of focusing on parallelized tree construction, we leverage the interpretation of random forests as a potential nearest-neighbor type algorithm to design a divide-and-conquer algorithm. We demonstrate the accuracy and scalability of our algorithm on multiple real-world applications using Spark, a widely used open-source platform for distributed computation. Spark is an open-source implementation of MapReduce, similar to Hadoop but with increased flexibility for in-memory caching of intermediate results.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program





For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home