Activity Number:
|
321
- Modern Statistical Learning for Ranking and Crowdsourcing
|
Type:
|
Topic Contributed
|
Date/Time:
|
Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #322826
|
|
Title:
|
A Permutation-Based Model for Crowdsourcing: Optimal Estimation and Robustness
|
Author(s):
|
Nihar B Shah* and Sivaraman Balakrishnan and Martin J. Wainwright
|
Companies:
|
Univ of California - Berkeley and Department of Statistics, CMU and EECS and Statistics, University of California, Berkeley
|
Keywords:
|
high dimensional statistics ;
crowd labeling ;
classification
|
Abstract:
|
The aggregation and denoising of crowd-labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and requirements of massive labeled datasets. In this paper, we propose a permutation-based model for crowd-labeled data that is a significant generalization of the popular "Dawid-Skene" model. Working in a high-dimensional non-asymptotic framework, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small statistical penalty as compared to the Dawid-Skene model. Finally, we propose a polynomial-time computable algorithm, called OBI-WAN, for provably efficient estimation under these models.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2017 program
|