Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 321 - Machine Learning and Variable Selection
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistical Computing
Abstract #318593
Title: A Statistical Testing Procedure for Validating Class Labels
Author(s): Melissa Key* and Ben Boukai and Susanne Ragg
Companies: STAT COE and IUPUI School of Science and University of Florida Health
Keywords: machine learning; proteomics; classification; non-parametric; class labels
Abstract:

Motivated by an open problem of validating protein identities in label-free shotgun proteomics work-flows, we present a testing procedure to validate class/protein labels using available measurements across instances/peptides. More generally, we propose a non-parametric procedure that can identify instances that are deemed, based on a distance (or quasi-distance) measure, to be outliers relative to the subset of instances assigned to the same class with minimal distributional assumptions. The test is shown to simultaneously control the Type I and Type II error probabilities whilst also controlling the overall error probability of the repeated testing invoked in the validation procedure of initial class labeling. Theoretical results are supplemented with simulation study as well as an application to a proteomics data set to illustrate the applicability and viability of the method. Even with up to 25% of instances mislabeled, our testing procedure maintains a high specificity and greatly reduces the proportion of mislabeled instances.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program