Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 524 - Recent Advances in Methods for Genomic Data Analysis
Type: Contributed
Date/Time: Thursday, August 11, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #323373
Title: Ensembling for Unsupervised Learning with Application to False Discovery Rates
Author(s): Jenna Michelle Landy* and Giovanni Parmigiani
Companies: Harvard T.H. Chan School of Public Health and Dana Farber Cancer Institute
Keywords: ensemble learning; unsupervised learning; false discovery rate; multiple testing

Unsupervised models present a unique challenge for hyperparameter optimization because measures of accuracy used in standard supervised techniques cannot be computed. In practice, this means hyperparameters are chosen differently by each researcher and are often left as the default values in their package of choice. We introduce a novel ensemble framework to address this issue for unsupervised problems with latent labels. This framework selects models to ensemble by their approximate performances, which are estimated using simulated labeled data informed by domain knowledge of the latent label structure. We implement our framework to improve existing false discovery rate methodology, viewing multiple hypothesis testing as an unsupervised classification problem with binary latent labels. Our simulation studies show that an ensemble outperforms three popular methods with their default hyperparameters and that, within an ensemble, combining models chosen based on their approximate performances outperforms an ensemble over a random subset of models. An R package for the false discovery rate implementation of this framework is publicly available on GitHub at jennalandy/gridsemblefdr.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program