eventscribe

The eventScribe Educational Program Planner system gives you access to information on sessions, special events, and the conference venue. Take a look at hotel maps to familiarize yourself with the venue, read biographies of our plenary speakers, and download handouts and resources for your sessions.

close this panel
support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

close this panel
←Back

349 – Health, Hospital, and Patient Surveys

Aggregate-Level PUF as a New Alternative to the Traditional Unit-Level PUF for Improving Analytic Utility and Data Confidentiality

Sponsor: Section on Survey Research Methods
Keywords: Aggregate Level Analysis, Calibration, Micro-Group, Micro-Mean, MG-MC, Subsampling

Joshua M. Borton

NORC

Avinash Singh

NORC

Creating a unit level public use file (PUF) with a rich set of analytic variables and high analytic utility has become a difficult problem due to increasing potential for availability of unit level information in public domains that could be used for matching purposes. Besides, unit level information is prone to possibly false perception of information disclosure about a target of interest which may be difficult to refute. The problem considered here arose in the context of CMS Medicare claims data where unit level corresponds to beneficiaries. To get around this problem, we propose a new approach of aggregate level PUF (or AL-PUF) where we modify the data structure by changing the unit of observation from beneficiaries to a small aggregate (termed micro-group or MG) signifying a group of beneficiaries having a common profile with respect to geo-demographics and prescription drug enrollment. For analytic utility, MG sizes should not be too large in order to make them as close as possible to the unit level; i.e., as building blocks, and for this reason larger MGs could be subdivided using additional outcome variables such as total number of claims and cost for each beneficiary. The basic idea of MG structure and small MG sizes is motivated from the commonly used aggregate level modeling as an alternative to unit level modeling for small area estimation. In considerations of data confidentiality, however, MG sizes should not be too small either (e.g., not below 10) depending on the level of risk tolerance. Having MGs as building blocks goes a long way in reducing disclosure risk because there is no beneficiary level information. To obtain true totals for various domains, it is sufficient to have only averages (termed micro-means or MMs which are common for all beneficiaries in the MG) of outcome variables for each MG along with MG counts; i.e., weighted up MG sizes. However, for MGs containing single beneficiaries in analytic profiles defining the domains of interest, actual beneficiary values of outcome variables could be disclosed by MG totals. To mitigate the above disclosure problem, two nested sub-samples of the full sample are defined; the larger one for computing MMs for categorical outcome variables or proportions of beneficiaries belonging to analytic profiles for each MG, and the smaller one for MG counts, while the full sample is used to obtain MMs for continuous outcome variables. Sub-sampling provides unbiased total estimates as well as justification for using two phase sampling results for precision estimation. There is some information loss due to sub-sampling but it can be minimized by suitably choosing sub-sampling rates. For increased precision, sampling weights from sub-samples are calibrated to the original full sample estimates for key analytic variables. In terms of modeling with AL-PUF data, it is observed that there might be need of instrumental variables (which might be available from a previous or separate independent sample) to avoid bias due to measurement errors Section on Survey Research Methods – JSM 2012 3338 because both dependent variable and independent variables or co-variates at the MG level in the form of estimated MMs from the full sample make the model error correlated with the co-variates unless the full sample is a census. However, there is no such problem with descriptive inference. Measures of analytic utility and confidentiality of the proposed method of AL-PUF are illustrated for a 5% sample of the 2008 Medicare Inpatient Claims data.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2013 CadmiumCD