Abstract:
|
Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic model that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are available, then both can efficiently estimate parameters of interest. When missing data are present, the EM algorithm can be used for parameter estimation in Bayesian Networks when there are training data (Friedman 1997) and in record linkage when there are no training data (unsupervised learning). EM and MCMC methods can be used for automatically estimating error rates in some of the record linkage situations (Belin and Rubin 1995, Larsen and Rubin 2001). Automatic error-rate estimation has generally not been addressed in the computer science literature. If there are interactions between variables, then parameters can be estimated. For Bayesian networks, efficient automatic methods exist for determining the most important interactions between variables exist (e.g., Friedman 1997, 1999).
|