eventscribe

The eventScribe Educational Program Planner system gives you access to information on sessions, special events, and the conference venue. Take a look at hotel maps to familiarize yourself with the venue, read biographies of our plenary speakers, and download handouts and resources for your sessions.

close this panel
support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

close this panel
←Back
‹‹ Go Back

William Winkler

U.S. Census Bureau



83 – Estimation, Benchmarking, and Record Linkage

Quality and Analysis of Sets of National Files

Sponsor: Survey Research Methods Section
Keywords: record linkage, edit/imputation, models, algorithms, generalized software

William Winkler

U.S. Census Bureau

The goal of various clean-up methods is to improve the quality of files to make them suitable for economic and statistical analyses. To fill-in missing data and 'correct' fields, we need generalized software that implements the Fellegi-Holt model (JASA 1976) to preserve joint distributions and assure that records satisfy edits. To identify/correct duplicates within and across files, we need generalized software that implements the Fellegi-Sunter model (JASA 1969). The goal of the clean-up procedures is to reduce the error in files to at most 1% (not currently attainable in many situations). In this presentation, we cover methods of modeling/edit/imputation and record linkage that naturally morph into methods of adjusting statistical analyses in files to linkage error. The modeling/edit/imputation software has four algorithms that may be each 100 times as fast as algorithms in commercial or experimental university software. The record linkage software used in the 2010 Decennial Census matches 10^17 pairs (300 million x 300 million) in 30 hours using 40 cpus on an SGI Linux machine. It is 50 times as recent parallel software from Stanford (Kawai et al. 2006) and 500 times as fast as software used in some statistical agencies. With skilled individuals and this fast software, a group of national files can be cleaned up and used in preliminary analyses in 3-6 months.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2014 CadmiumCD