Abstract:
|
The Fellegi-Sunter record linkage paradigm specifies the functional relationship between agreement probabilities and match weights for each identifier. This paradigm assumes conditional independence of identifier agreements. However, many identifier agreements are, in fact, dependent. For example, within the set of non-matched pairs, if we know the first names agree, then it is more likely that last names also agree since name distributions vary by ethnicity. In this paper we present an approach to specify and estimate agreement probabilities, the relationship between them, and the total number of links without the use of training data. This, in turn, yields estimates of match rates (i.e., the proportion of matches among a set of pairs) for a given identifier agreement pattern.
|