JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 624
Type: Invited
Date/Time: Thursday, August 7, 2014 : 10:30 AM to 12:20 PM
Sponsor: Social Statistics Section
Abstract #314146
Title: Large-Scale Clustering Approaches for Identifying Unique Human Rights Violations
Author(s): Samuel Ventura*+
Companies: Carnegie Mellon

In today's large-scale record linkage problems, datasets can be prohibitively large and have high rates of missingness or error in the fields, making it difficult to calculate the distance between record-pairs for linkage. Supervised learning approaches can help to alleviate these concerns, but rely on the accuracy of a single estimate of distance from a single model. By using a distribution of distance estimates instead (e.g. from an ensemble of classifiers trained on subsets of training data), we may be able to more accurately represent the distance between pairs of records. We present a large-scale record linkage framework that incorporates classifier ensembles and ``distribution linkage" hierarchical clustering to identify clusters of records corresponding to unique entities. We examine the performance of different distributional summary measures for distances in hierarchical clustering. We illustrate this approach with an application of record linkage to the identification of unique human rights violations from the ongoing civil war conflict in Syria. Finally, we examine the efficacy of different string comparison metrics for Arabic text and discuss alternative approaches.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program

2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.