Keywords: Privacy analytics, Statistical efficiency, Aggregation, Symbolic data
Healthcare data are often subjected to regulatory and contractual restrictions and protecting the confidentiality of the patient population is mandated by the law. It is often the case that aggregation is a method of choice for de-identifying patient records which in turn leads to histogram data. Statistical analyses of histogram data is complicated due to different methods of aggregation employed by different statistical agencies. In this presentation, we describe a statistically rigorous approach for the analyses of histogram data and bring out the trade-o_ in statistical efficiency and privacy. For this reason, we provide multiple notions of privacy and aggregation strategies that maintain privacy for a pre-determined statistical efficiency. Extensions of this idea to other symbolic data will also be described.