Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 440 - SLDS CSpeed 8
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318796
Title: Regularization for Shuffled Data Problem via Exponential Family Prior on the Permutation Group
Author(s): Zhenbang Wang* and Emanuel Ben-David and Martin Slawski
Companies: George Mason University and US Census Bureau and George Mason University
Keywords: Permutation; Regularization; Conjugate Prior; EM algorithm; Mallows Model
Abstract:

In the analysis of data sets consisting of pairs (X_i,Y_i)^n, a tacit assumption is that each pair corresponds to the same observation unit. If, however, such pairs are obtained via record linkage of two files, this assumption is often violated due to mismatch error resulting from the absence of a common unique identifier. Recently, there has been a surge of interest in the problem under the term ”shuffled data” in which the underlying correct pairing of (X_i,Y_i)-pairs is represented via an unknown index permutation. Explicit modeling of this quantity tends to be associated with substantial overfitting. For the purpose of regularization, we consider a flexible exponential family prior on the permutation group that can be used to integrate various structures such as sparse and locally constrained shuffling. This prior is shown to be conjugate if the likelihood of the (X,Y)-pairs has an exponential family form, as in the case of generalized linear models. Inference is based on the EM algorithm in which the intractable E-step is approximated by the Fisher-Yates algorithm. Comparisons on synthetic and real data show that the proposed approach compares favorably to competing methods.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program