Online Program Home
My Program

Abstract Details

Activity Number: 392 - Large-Scale Data Analysis via Spectral Methods
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract #303039
Title: Distributed Ridge Regression in High Dimensions
Author(s): Yue Sheng* and Edgar Dobriban
Companies: University of Pennsylvania and University of Pennsylvania
Keywords: ridge regression; distributed learning; random matrix theory

In the big data era, it is important to study how to do statistical inference and machine learning in a distributed setting. In this talk, we will discuss about how to do distributed ridge regression under a high-dimensional setting . We perform ridge regression on each local machine, then combine all the local information to get a weighted-average estimator in a global data center. How much do we lose compared to doing ridge regression on the full data? Here we mainly focus on the performance loss in estimation error. We discover several key phenomena. First, we find the distributed estimator would not lose all the efficiency even if we have infinite number of machines, which is different from the case in the distributed linear regression. Second, under some conditions, we can choose the optimal tuning parameters locally. Third, the coordinates of the optimal weight vector do not sum up to one, which means the naive average is not optimal.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program