Online Program Home
My Program

Abstract Details

Activity Number: 420 - Contributed Poster Presentations: Health Policy Statistics Section
Type: Contributed
Date/Time: Tuesday, July 30, 2019 : 2:00 PM to 3:50 PM
Sponsor: Health Policy Statistics Section
Abstract #306626
Title: Statistical De-Identification of a Health Dataset Based on a Common Data Model
Author(s): Megan Branda* and Debashis Ghosh
Companies: University of Colorado - Denver and University of Colorado Anschutz Medical Campus
Keywords: common data model; de-identification; Safe Harbour
Abstract:

An institutional effort to make an international common data model de-identified (DI) for researchers within the institution. To adhere to Safe Harbor guidelines statistical DI was evaluated. The data consists of quasi-identifiers (QI, ex patient name), sensitive attributes (ex diagnosis code (DC)) and free text fields. Literature showed many algorithms, with aggregating or loss of data to the point of being statistically uninformative. An evaluation of use of the data directed the focus of the DI based off this use case. QI were grouped according with a minimum size of 10, where DC assessed for frequency occurring for any one DC within a group of patients >40%. MIST software was used to assess free text fields for identifiers was implemented on a sample of 200 cases, alterations were made to be effective at the 95% level accuracy. Dates for encounters were shifted at the patient level. A loss of 0.4% of patients of the 2.5 million cohort was seen due to these steps. Application of the many theoretical approaches in the healthcare space brings new challenges. Tradeoffs in scopes of potential analyses vs preserving patient privacy need to be examined on a use case-by-use case basis


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program