Abstract:
|
Linking data from different sources reduces costs of data collection and expands research possibilities. However, record linkage is typically only successful for a subset of records. This might introduce bias if linked cases differ from non-linked cases. Evaluating the consequences of non-linkage is difficult in practice, as the impact not only depends on the linkage process but also on the analysis of interest. In this talk we propose a simple simulation based approach for assessing whether a specific analysis is biased due to non-linkage. The basic idea is to model the linkage process, introduce additional non-linkage based on this model, and compare the analysis results obtained using the remaining cases to the results that would be obtained if the linkage process would be independent of the data. Through simulations and a real data application we illustrate that (i) impact assessments regarding the consequences of non-linkage should always start from the analysis model of interest, (ii) machine learning methods should be preferred for modeling the linkage process, and (iii) the proposed methodology can be a simple strategy for measuring the impacts of non-linkage.
|