Abstract:
|
Tissue-level gene expression analysis is known to be confounded by cellular heterogeneity. To adjust for the confounding, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue omics data. However, these methods produce vastly different results under various settings, and benchmarking showed no universally best deconvolution methods. To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which uses L1-loss-based ensemble learning to synthesize the results from deconvolution methods, reference datasets, marker gene selection procedures, data normalization, and transformations. Different from simulation-based benchmarking, we compiled four large real datasets with measured cellular fractions and comprehensively evaluated EnsDeconv’s performance in different tissue types. Evaluations demonstrated that EnsDeconv yields more stable, robust, and accurate results than existing methods. In addition, we illustrated that EnsDeconv enables various downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data.
|