Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 6 - Learning from Permuted Data and the Analysis of Linked Files
Type: Invited
Date/Time: Monday, August 3, 2020 : 10:00 AM to 11:50 AM
Sponsor: Government Statistics Section
Abstract #308068
Title: Spherical Regression Under Mismatch Corruption with Application to Automated Knowledge Translation
Author(s): Xu Shi* and Xiaoou Li and Tianxi Cai
Companies: University of Michigan and University of Minnesota and Harvard University
Keywords: electronic health records; hard-thresholding; mismatched data; ontology translation; spherical regression
Abstract:

Recent federal initiatives are incentivizing the collection and linkage of electronic health records across clinics, hospitals, and healthcare systems. A key challenge to the use of electronically assembled cohorts is the inconsistent “languages” used in different healthcare systems and across time. For example, due to the financial incentives and heterogeneity in healthcare systems, different healthcare providers may use alternative medical codes to record the same diagnosis or procedure, limiting the transportability of phenotyping algorithms and statistical models across healthcare systems. In this talk, I formulate the idea of medical code translation into a statistical problem of inferring a mapping between two sets of multivariate, unit-length vectors learned from two healthcare systems respectively. The statistical problem is particularly interesting because the data is corrupted by a fraction of mismatch in the response-predictor pairs, whereas classical regression analysis tacitly assumes that the response and predictor are correctly linked. I propose a novel method for mapping recovery and establish theoretical guarantees for estimation and model selection consistency.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program