JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 346
Type: Contributed
Date/Time: Tuesday, August 5, 2014 : 10:30 AM to 12:20 PM
Sponsor: Survey Research Methods Section
Abstract #312286
Title: New Technologies for Privacy Protection in Data Collection and Analysis
Author(s): Samuel Wu*+ and Shigang Chen and Deborah Burr and Long Zhang
Companies: University of Florida and University of Florida and University of Florida and University of Florida
Keywords: Orthogonal transformation ; Privacy-preserving data collection ; General linear model ; Contingency table analysis ; Logistic regression
Abstract:

A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal confidential information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop new data collection methods with strong privacy protection. These new technologies hold the promise of removing trust obstacle, promoting objective data collection, allowing rapid data dissemination, and helping unrestricted sharing of big data.

The new method, called {\it triple matrix-masking (TM$^2$)}, ensures that the raw data stay with research participants and only masked data are collected , which can be distributed and shared freely. TM$^2$ offers privacy protection with an immediate matrix transformation at time of data collection so that even the researchers cannot see the raw data, and then further uses matrix transformations to guarantee that the masked data will still be analyzable by standard statistical methods. A critical feature of the method is that the keys to generate the masking matrices are held separately,which ensures that nobody sees the actual data. Also, because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used, hence the new method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique.

In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.