JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 328
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #305113
Title: Recent Advances in Graph-Based Deduplication
Author(s): Eric Sun*+
Companies: Facebook
Address: 1601 Willow Rd., Menlo Park, CA, 94025, United States
Keywords: social networks ; data mining ; large-scale data analysis ; map-reduce ; crowdsourcing
Abstract:

The Facebook platform allows both users and external websites to easily add new entries to the entities graph. These entities are the interests, movies, books, tv shows, music, places, and other concepts that people care about and connect to. However, without proper maintenance and incentive structures, problems of deduplication and disambiguation can severely diminish its quality. In this talk, we discuss the challenges and progress in building and maintaining Facebook's social graph of entities. We propose several solutions for these problems that can be applied at Facebook's scale.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.