Online Program
Saturday, February 21 | |
PS3 Poster Session 3 & Continental Breakfast |
Sat, Feb 21, 8:00 AM - 9:15 AM
Napoleon AB |
Are You Really Who We Think You Are? Recognizing and Controlling Biases in Statistical Analyses of Linked Data (303027)*Sigurd Wilson Hermansen, WestatKeywords: linkage, identifiers, probabilistic, deterministic, fuzzy, bias, selection, misclassification, duplication In a new age of web scrapers and devices streaming Big Data, applied statisticians are looking more closely at the quality of linked data and how person or entity linkage errors may bias results of statistical analyses. As a consequence, we are finding data linkage biases. In analyses of linked data coming from different databases, we now have to assess whether, for instance, the educational level or credit rating covariate that comes from a web database actually belongs to the health or payment history outcome in our subject database. Similar concerns worry applied statisticians and data analysts across the whole spectrum of observational research and predictive modeling. Examples of data linkage biases and useful statistics for measuring them lead into a quick review of best practice data linkage and integration, tracing, and “deduplication” methods. Guidelines for practice touch on software licensing questions and ethical and legal obligations for disclosure control.
|