JSM 2016 Online Program

Activity Number:	590
Type:	Invited
Date/Time:	Wednesday, August 3, 2016 : 2:00 PM to 3:50 PM
Sponsor:	Government Statistics Section
Abstract #318316
Title:	Supervised Neighborhoods for Distributed Nonparametric Regression
Author(s):	Ameet Talwalkar*
Companies:	University of California at Los Angeles
Keywords:	Locally Linear Models ; Adaptive Nearest Neighbors ; Distributed nonparametric regression ; Supervised Neighborhoods ; Distributed Random Forests ; Apache Spark/MLlib
Abstract:	Techniques for nonparametric regression based on fitting small-scale local models at prediction time have long been studied in statistics and pattern recognition, but have received less attention in modern large-scale machine learning applications. In practice, such methods are generally applied to low-dimensional problems, but may falter with high-dimensional predictors if they use a Euclidean distance-based kernel. We propose a new method, Silo-RF, for fitting prediction-time local models that uses supervised neighborhoods that adapt to the local shape of the regression surface. To learn such neighborhoods, we use a weight function between points derived from random forests. We prove the consistency of Silo-RF, and demonstrate through simulations and real data that our method works well in both the serial and distributed settings. In the latter case, Silo-RF learns the weighting function in a divide-and-conquer manner, entirely avoiding communication at training time.

Authors who are presenting talks have a * after their name.