eventscribe

The eventScribe Educational Program Planner system gives you access to information on sessions, special events, and the conference venue. Take a look at hotel maps to familiarize yourself with the venue, read biographies of our plenary speakers, and download handouts and resources for your sessions.

close this panel

SUBMIT FEEDBACKfeedback icon

Please enter any improvements, suggestions, or comments for the JSM Proceedings.

Comments


close this panel
support

Technical Support


Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

Submit Support Ticket


close this panel
‹‹ Go Back

Dean M. Resnick

NORC at the University of Chicago



‹‹ Go Back

Please enter your access key

The asset you are trying to access is locked for premium users. Please enter your access key to unlock.


Email This Presentation:

From:

To:

Subject:

Body:

←Back IconGems-Print

308 – Data Integration in 21st Century Government Surveys

Adjusting Match Weights to Partial Levels of String Agreement in Data Linkage

Sponsor: Government Statistics Section
Keywords: record linkage, Fellegi-Sunter, string comparisons, agreement weights

Dean M. Resnick

NORC at the University of Chicago

The Fellegi-Sunter record linkage paradigm in its original conception was based on the idea that for a set of comparison fields, such as first name, year of birth, and state of residence, agreement of each field between records in a pair is strictly binary: either there is complete agreement or there is not. For string comparisons, particularly for names fields, intuition tells us that having two versions of a name (e.g. ‘Resnick’ compared to ‘Reznik’) that are very similar but not identical is more indicative of a record pair being a match rather than a non-match. There are several string comparison tools such as Jaro-Winkler similarity scores and Levenshtein distances that can quantify the level of agreement as a full range of values between complete agreement and complete non-agreement. Certainly, one way of using such a metric is to establish a cutoff level above which we consider the fields essentially in agreement, but this would require a method of determining the cutoff. However, we are instead looking for a way to assess several gradations of agreement for string comparisons and assign agreement and non-agreement weights corresponding to the observed gradation. In this paper, we describe such a method that maintains and expands upon the Fellegi-Sunter approach.

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.

© 2020 CadmiumCD