Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. In this talk, we investigate the statistical estimation problem in crowdsourcing for categorical labeling task, i.e., how to estimate true labels and workers' quality from the noisy labels provided by non-expert crowdsourcing workers.
The MLE-based Dawid-Skene estimator has been widely used for this problem. However, it is hard to theoretically justify its performance due to the non-convexity of log-likelihood function. We propose a two-stage algorithm where the first stage uses the spectral method to obtain an initial estimate of parameters and the second stage refines the estimation via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. This is a joint work with Yuchen Zhang, Dengyong Zhou and Michael I. Jordan.
|