Abstract:
|
Standard regression setups take it for granted that the response variables are observed jointly with their corresponding predictor variables. However, in the case of asynchronous data collection responses and predictors are given in two separate files with limited information about which records belong to the same statistical unit. Such setting pertains to record linkage, data privacy, and various other applications in computer science and engineering. In this talk, we present a series of practical methods and accompanying theory on the setting in which predictors and responses are observed up to an unknown permutation that encodes the underlying correspondence, starting from multivariate linear regression and concluding with a specific notion of monotone functions arising in optimal transportation. Specifically, we uncover a “blessing of dimensionality” phenomenon that indicates that recovery of the permutation becomes easier as the dimension of the response increases.
|