|
Activity Number:
|
462
|
|
Type:
|
Topic Contributed
|
|
Date/Time:
|
Wednesday, August 1, 2007 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Government Statistics
|
| Abstract - #309723 |
|
Title:
|
Measuring Disclosure Risk and an Examination of the Possibilities of Using Synthetic Data in the Individual Income Tax Return Public Use File
|
|
Author(s):
|
Michael Weber*+ and John Czajka and Sonya Vartivarian
|
|
Companies:
|
Internal Revenue Service and Mathematica Policy Research, Inc. and Mathematica Policy Research, Inc.
|
|
Address:
|
Statistics of Income, Bethesda, MD, 20814,
|
|
Keywords:
|
Synthetic data ; Public Use ; Income Tax Returns
|
|
Abstract:
|
The Statistics of Income Division (SOI) currently measures disclosure risk through a distance based technique that compares the Public Use File against the population of all tax returns and uses top-coding, subsampling and multivariate microaggregation as disclosure avoidance techniques. SOI is interested in exploring the use of other techniques that prevent disclosure while providing less data distortion. Synthetic or simulated data may be such a technique. But while synthetic data may be the ultimate in disclosure protection, creating a synthetic dataset that preserves the key characteristics of the source data presents a significant challenge. An additional constraint in creating synthetic data for the SOI PUF is found in maintaining the accounting relationships among numerous income, deduction, and tax items that appear on a tax return.
|