Abstract:
|
Probabilistic record linkage is the process of identifying which records in two datasets represent the same entity in the absence of unique identifiers. Bayesian record linkage algorithms provide a mechanism to propagate uncertainty of the linkage structure. An important aspect of Bayesian analysis is the specification of prior distributions for model parameters. In some applications, information on records that represent the same entities is available and can be used as prior information. However, records that are known to represent the same entities may be different then records that are not known. The power prior distribution is an informative prior distribution that incorporates historical data. We examine the performance of the power prior distribution in file linkage applications using simulation analysis. The simulations were based on data from the CDC Behavioral Risk Factor Surveillance System. In all the simulation’s configurations, using the power prior distribution appears to improve linkage accuracy. Based on our results the power prior distribution is an effective method to incorporate known records that represent the same entities in the record linkage process.
|