Abstract:
|
In working with authorities in the city of Albuquerque, multiple databases have been combined to better understand arrest patterns in the region over the past several years. This requires linking individuals who have been arrested multiple times, as well as identifying arrests that appear in the different databases. A key component of this process is record linkage. Classical approaches to record linkage, such as the method of Felligi and Sunter, consider each possible pair of records between databases and assigning link probabilities to each one. A drawback of considering pairwise links alone is that sometimes the transitive nature of links is violated. In order to better handle such information clashes, we propose a Bayesian linkage method that considers a large set of possible pairs at once. At the heart of this approach is a Potts model representation that tracks which records are assigned to the same individual. We take a Bayesian approach, using MCMC. The computations are eased by using domain-specific linkage rules to partition our dataset into feasible blocking groups. We demonstrate this approach on multiple crime databases from the Albuquerque police department.
|