Abstract Details
Activity Number:
|
624
|
Type:
|
Invited
|
Date/Time:
|
Thursday, August 7, 2014 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Social Statistics Section
|
Abstract #314146
|
|
Title:
|
Large-Scale Clustering Approaches for Identifying Unique Human Rights Violations
|
Author(s):
|
Samuel Ventura*+
|
Companies:
|
Carnegie Mellon
|
Keywords:
|
|
Abstract:
|
In today's large-scale record linkage problems, datasets can be prohibitively large and have high rates of missingness or error in the fields, making it difficult to calculate the distance between record-pairs for linkage. Supervised learning approaches can help to alleviate these concerns, but rely on the accuracy of a single estimate of distance from a single model. By using a distribution of distance estimates instead (e.g. from an ensemble of classifiers trained on subsets of training data), we may be able to more accurately represent the distance between pairs of records. We present a large-scale record linkage framework that incorporates classifier ensembles and ``distribution linkage" hierarchical clustering to identify clusters of records corresponding to unique entities. We examine the performance of different distributional summary measures for distances in hierarchical clustering. We illustrate this approach with an application of record linkage to the identification of unique human rights violations from the ongoing civil war conflict in Syria. Finally, we examine the efficacy of different string comparison metrics for Arabic text and discuss alternative approaches.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2014 program
|
2014 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Professional Development program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.