Online Program Home
My Program

Abstract Details

Activity Number: 21 - Aligning Data Normalization with Analysis Goals for Reproducible Research
Type: Topic Contributed
Date/Time: Sunday, July 28, 2019 : 2:00 PM to 3:50 PM
Sponsor: Biometrics Section
Abstract #305174
Title: Batch Effects Correction with Unknown Subtypes with Application to Paired MicroRNA Data Sets
Author(s): Yingying Wei* and Li-Xuan Qin
Companies: The Chinese University of Hong Kong and Memorial Sloan Kettering Cancer Center
Keywords: Disease subtype discovery; High-throughput experiments; Integrative analysis; Interlaboratory comparisons; Model-based clustering

Mining valid scientific discoveries from genomic data is always hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed “batch effects,” and the latter is often modeled by subtypes. However, there is a lack of research on the correction of batch effects with the presence of unknown subtypes. Here, we propose a novel model BUS to simultaneously correct batch effects, cluster samples into subtypes, identify features that distinguish subtypes, allow the number of subtypes to vary from batch to batch, and enjoy a linear-order computation complexity. We prove the identifiability of BUS and provide study designs under which batch effects can be corrected.

When combining real datasets, as the true subtype of each sample is unknown, it is difficult to evaluate the performance of clustering. Very fortunately, the GSE109059 paired microRNA datasets designed by Qin et al (2018, Sci Data) assayed each sample twice in two batches, thus providing an unprecedented opportunity to evaluate the accuracy of clustering and batch effects correction. The subtyping by BUS is highly concordant for the same biological sample profiled in the two batches.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program