Online Program Home
My Program

Abstract Details

Activity Number: 642 - Advanced Statistical Methods for Large Data Sets
Type: Topic Contributed
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: Social Statistics Section
Abstract #304496
Title: Efficient Fused Learning for Distributed Imbalanced Data
Author(s): Yuanyuan Lin*
Keywords: Distributed imbalanced data; Logistic regression; Oracle estimator

Any data set exhibiting an unequal or highly-skewed distribution between its classes/categories can be regarded as imbalanced data. Due to privacy concern and other technical limitations, imbalanced data distributed across locations/machines cannot be simply combined and stored in a single central location. In this paper, different from the naive average estimate, we propose a fused learning method for logistic regression in analyzing distributed imbalanced data by combining all the cases available on all machines, which is stable and efficient. The consistency and asymptotic normality of the proposed estimator are established under regularity conditions. Asymptotic efficiency compared with the oracle estimator based on the entire imbalanced data is also studied. Extensive simulation studies show that the proposed estimator is as efficient as the oracle estimator in various situations.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program