Abstract:
|
The conflict in Syria has been tremendously well-documented, however, we still do not know how many people have been killed from conflict-related violence. We present work that takes a large set of noisy records and using computer science based hashing schemes, puts similar records in the same blocks or bins, reducing the dimension of the space of records. We compare these to more traditional partitioning methods from statistics. Next, we present work on attempting to estimate death counts in Syria using record linkage techniques. Record linkage is the process of merging together multiple databases (in the absence of unique identifiers) as to remove duplicate entities. We present viable methods for the Syrian application, and speak to specific challenges with this databases that are different from others we have encountered. Finally, we speak to ongoing work regarding scalability to Bayesian methods in entity resolution. In this talk, we present two unsupervised Bayesian and one supervised method for record linkage and present preliminary results on Syrian death counts. We compare our methods based on common evaluation metrics and speak to the benefits of our three proposed approaches.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.