Abstract:
|
This paper focuses on the privacy paradigm of providing access for researchers to remotely carry out analyses on sensitive data stored behind firewalls, specifically for either vertically or horizontally partitioned data across physically separate databases that cannot be combined. We develop and demonstrate a method to accurately calculate the multivariate normal likelihood equation, for a set of parameters given the partitioned data. MLE estimates can then be achieved without sharing any data or any true statistics of the data across firewalls. We show that under a certain set of assumptions our method for estimation across partitions achieves identical results as estimation with combined data. Privacy is maintained by interweaving multiple secure summation routines. This ensures each node receives noisy statistics, such that all the noise will not be removed until the last step, revealing the true total log likelihood value. Applications include any method utilizing multivariate normal MLE. We give examples for estimating structural equation models (SEMs) with arbitrarily partitioned data, both in simulations and with real data, and we provide code for easy implementation.
|