Abstract:
|
As the scale and scope of big data in health care expand, so does the desire to extract evidence from this ocean of data. Many data owners have banded together to form distributed research networks, which enable each site to control its own data while increasing power through across-site analyses. However, the practical challenges of these complex data and computing environments are acute. This talk addresses missing data in analyses on distributed research networks of administrative data. Billing claims data have (at least) three forms of missingness: missing values of defined variables (e.g., N/A in a structured data field), silently missing variables (e.g., diagnosis codes that were never recorded), and silently missing records (e.g., services for which no claim was filed). Standard methods for missing data have focused on the first form of missingness, while the latter two remain less well studied. I will discuss methods that are flexible enough to accommodate site-specific reasons for and patterns of missingness, are feasible in distributed research networks, have good statistical performance in multi-site analyses, and address the forms of missingness relevant to claims data.
|