JSM 2017 Online Program

Activity Number:	448 - The Essential Role of Statistics in Modeling Complex Data in Business and Economics
Type:	Invited
Date/Time:	Wednesday, August 2, 2017 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistics in Marketing
Abstract #325018
Title:	Spectral Methods Meet EM: a Provably Optimal Algorithm for Crowdsourcing
Author(s):	Xi Chen*
Companies:	NYU
Keywords:
Abstract:	Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. In this talk, we investigate the statistical estimation problem in crowdsourcing for categorical labeling task, i.e., how to estimate true labels and workers' quality from the noisy labels provided by non-expert crowdsourcing workers. The MLE-based Dawid-Skene estimator has been widely used for this problem. However, it is hard to theoretically justify its performance due to the non-convexity of log-likelihood function. We propose a two-stage algorithm where the first stage uses the spectral method to obtain an initial estimate of parameters and the second stage refines the estimation via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. This is a joint work with Yuchen Zhang, Dengyong Zhou and Michael I. Jordan.

Authors who are presenting talks have a * after their name.