JSM 2017 Online Program

Activity Number:	321 - Modern Statistical Learning for Ranking and Crowdsourcing
Type:	Topic Contributed
Date/Time:	Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #322826
Title:	A Permutation-Based Model for Crowdsourcing: Optimal Estimation and Robustness
Author(s):	Nihar B Shah* and Sivaraman Balakrishnan and Martin J. Wainwright
Companies:	Univ of California - Berkeley and Department of Statistics, CMU and EECS and Statistics, University of California, Berkeley
Keywords:	high dimensional statistics ; crowd labeling ; classification
Abstract:	The aggregation and denoising of crowd-labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and requirements of massive labeled datasets. In this paper, we propose a permutation-based model for crowd-labeled data that is a significant generalization of the popular "Dawid-Skene" model. Working in a high-dimensional non-asymptotic framework, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small statistical penalty as compared to the Dawid-Skene model. Finally, we propose a polynomial-time computable algorithm, called OBI-WAN, for provably efficient estimation under these models.

Authors who are presenting talks have a * after their name.