Online Program Home
  My Program

Abstract Details

Activity Number: 303 - Big Data
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #323117
Title: Big Health Data Using Probabilistic Record Linkage
Author(s): Bong-Jin Choi* and Ke Meng and Tzy-Mey Kuo and Adrian Meyer and Laura Green and Christopher Baggett and Anne-Marie Meyer and YunKyung Chang
Companies: University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill
Keywords: Big Data ; Record Linkage ; Probabilistic Linkage ; Bayesian Approach ; Health Data ; Sensitivity Analysis
Abstract:

There has been an explosion of health and medical data over the past several decades; much of which originates from fragmented, incompatible systems. This makes linking patient data a critical task for the health and medical fields. Probabilistic and deterministic linking are two approaches for linking, though probabilistic linking can offer significant advantages to construct big health data, especially when quality or discriminatory power of identifiers varies across datasets. However, many subjective decisions are made when applying probabilistic linking, including how to assign matching probabilities or even the cut-point/threshold to define a match. All of these decisions may impact the validity, reliability and reproducibility of the linkage, as well as any research which uses the subsequent dataset. This study builds on a systematic approach to defining probabilities, missing data, and introduces objective processes for choosing the optimal matching score to define a match. Applying the demonstrated approaches can help to increase transparency and reproducibility of probabilistic methods and improve validity of linked big health data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association