Online Program
A Risk-based Methodology to De-identify Protected Health Information for the Heritage Health PrizeView Presentation *Luk Arbuckle, CHEO Research InstituteKhaled El Emam, CHEO Research Institute Ben Eze, Privacy Analytics Jonathan Gluck, Heritage Provider Network Jeremy Howard, Kaggle Gunes Koru, University of Maryland Lisa Lisa Gaudette, Privacy Analytics Emilio Neri, CHEO Research Institute Sean Rose, Privacy Analytics Keywords: re-identification, risk assessment, longitudinal, medical data, data disclosure, privacy According to the US Health Insurance Portability and Accountability Act (HIPAA), the public disclosure of Protected Health Information (PHI) without patient consent is permitted if it is de-identified using accepted statistical methods to manage the risk of individual re-identification. The Heritage Provider Network (HPN), a provider of health care services in California, initiated the Heritage Health Prize (HHP) competition “to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year, using historical claims data”. However, the complex longitudinal data from HPN for the HHP competition required the development of new methods to assess and evaluate the risk of re-identification. Five plausible re-identification attacks on this data were identified, and the probability of re-identification was evaluated for each. A de-identification algorithm was applied when the risk of re-identification was found to be above a pre-defined threshold. The final HHP competition dataset had a very small risk of re-identification, and was robust to violations of initial assumptions.
|