This is the program for the 2010 Joint Statistical Meetings in Vancouver, British Columbia.

Abstract Details

Activity Number: 314
Type: Contributed
Date/Time: Tuesday, August 3, 2010 : 8:30 AM to 10:20 AM
Sponsor: Section on Survey Research Methods
Abstract - #307481
Title: Expected Number of Random Duplications Within or Between Lists
Author(s): William E. Yancey*+
Companies: U.S. Census Bureau
Address: 4600 Silver Hill Road, Suitland, MD, 20746,
Keywords: record linkage ; deduplication ; Stirling numbers

The U.S. Census seeks to determine duplicately listed individuals by searching across lists to identify records with the same name and birth date. The question arises of how many of these agreeing records are random agreements, two different people with the same name and birth date. To formally answer this question, we consider first the familiar Birthday Problem and then the more complicated Collision Problem. For each of these problems we exhibit the explicit probability distributions from which we can compute means and variances for some parameter values. We apply this result to voter registration lists for Oregon and Washington to estimate the number of "false matches" occur across these lists.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2010 program

2010 JSM Online Program Home

For information, contact or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.