572 – Would the Real Steve Fienberg Please Stand Up: Getting to Know a Population from Multiple Incomplete Files
Discussion of "Would the Real Steve Fienberg Please Stand Up: Getting to Know a Population from Multiple Incomplete Files" Invited Papers, Social Statistics Section
Michael D. Larsen
The George Washington University
Many of us grew up with the game ``Where in the World is Carmen San Diego?" Nowadays, the name of the game for the U.S. Census Bureau, is who's the real Steve Fienberg, where they are dealing with deciding if someone named Steve Fienberg is the same person across multiple lists. For example, the 2010 Census, the 2010 Census Coverage Measurement (CCM) Program, and the American Community Survey (ACS) are three lists that might be useful in deciding the answer to our proposed question about Steve. For example, is Steve Fienberg with a certain set of covariates in Pennsylvania the same as a Steve Fienberg in Ohio? How do we go about making this distinction? This is just one question that our group of speakers will be researching from different aspects.
Rob Hall, Carnegie Mellon University, and Tom Mule, U.S. Census Bureau, are purely interested in record linkage problem at hand, while Zachary Kurtz, Carnegie Mellon University, is framing the problem from the viewpoint of capture recapture (CRC) techniques, which relies heavily on record linkage.
Zachary Kurtz is interested in getting individual estimates of the undercount by combining multiple files using nonparametric smoothing and capture recapture methods. Rob Hall is interested in developing classical and Bayesian nonparametric methodological theory that tests for partial matching where we could have as many as k files. Finally, Tom Mule proposes extending two-file Bayesian record linkage to allow string comparisons of matching fields as well as to account for missing comparisons due to nonresponse. All the presenting authors propose simulation studies as well as analyzing Census data that was mentioned earlier. This work is novel, ground-breaking, and much of this has never been explored before. The implications to future Census applications could be very instructive.