Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 70 - Multivariate Statistical Methods
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313774
Title: Multi-Categorical Crowdsourcing Using Subgroup Latent Factor Modeling
Author(s): Qi Xu* and Yubai Yuan and Junhui Wang and Annie Qu
Companies: University of California, Irvine and University of Illinois at Urbana-Champaign and City University of Hong Kong and University of California Irvine
Keywords: Crowdsourcing; latent factor model; matrix factorization; multi-directional separation penalty
Abstract:

Crowdsourcing has emerged as an alternative solution for collecting large-scale data from nonexperts, for example, in medical diagnosis and natural language processing tasks. Instead of getting experts involved, crowdsourcing collects labels, answers or solutions from a crowd of workers through online platforms. However, majority of workers are not domain experts so that their contributed answers are noisy and not reliable to some extent. In this paper, we propose a two-stage model to infer the true labels for binary and multicategory classification tasks. In the first stage, we fit the observed labels with a latent factor model and incorporate group structures from both tasks and workers through a multi-directional separation penalty. For the multi-categorical case, we introduce a group-wise rotation to align the workers’ latent factors to different task categories. In the second stage, we infer true labels based on identified high-quality worker groups to improve prediction accuracy. In theory, we show the estimation consistency of latent factors and the classification consistency of the proposed method. The simulation and real data examples also favor the proposed method.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program