572 – Would the Real Steve Fienberg Please Stand Up: Getting to Know a Population from Multiple Incomplete Files
Privacy-preserving Record Linkage and Privacy-preserving Blocking for Large Files with Cryptographic Keys using Multibit Trees
Rainer Schnell
University of Duisburg-Essen
Increasingly, administrative data is being used for statistical purposes, for example registry based census taking. In practice, this usually requires linking separate files containing information on the same unit, without revealing the identity of the unit. If the linkage has to be done without a unique identification number, it is necessary to compare keys which are derived from unit identifiers and which are assumed to be similar. When dealing with large files like census data or population registries, comparing each possible pair of keys of two files is impossible.
Therefore, special algorithms (blocking methods) have to be used to reduce the number of comparisons needed. If the identifiers have to be encrypted due to privacy concerns, the number of available algorithms for blocking is very limited. This paper describes the adoption of a recently introduced algorithm for this problem and its performance for large files.