Abstract:
|
Collaborative machine learning, or more specifically federated learning, is a machine learning paradigm where many models are trained on piecewise data silos, while the inference parameters are combined using a central node. The unique feature of federated learning is that data is never shared between silos, maintaining privacy, thus keeping the model training decentralized. Meaningful inference in federated learning therefore depends critically on the combination step at the central server. We investigate the effect of silo heterogeneity on the resultant inference. While federated learning itself is a relatively new concept, there is limited literature on node heterogeneity, with most approaches assuming exchangeable data distributions. We study the bias when estimating federated learning models across inconsistent nodes. We present a novel method to flag uncomfirming silos during federated training and testing, to maintain validity of inference. Additionally, we also propose a statistical algorithm using MCMC for approximate inference in these heterogeneous silo settings. Finally, we demonstrate the performance of our proposed method in simulated and real data settings.
|