Abstract:
|
With the advent of single cell sequencing, inferring gene regulatory networks (GRNs) from single cell RNA-seq (scRNA-seq) data has become central for deciphering regulatory relationships between genes. However, estimation of GRNs is challenging because of the large number of interactions to confirm or reject, subtleties in determining the nature of regulatory relationships (e.g. direct versus indirect regulation, time-ordering of co-expression relationships), and unique properties of scRNA-seq data such as excessive zero counts (i.e. dropout). Much of the literature on GRN estimation is focused on utilising psuedo-time measurements to address the second challenge, however, few GRN methods explicitly attempt to handle dropout. To address dropout, we introduce a zero-inflated random forests (ZIRFs) model. Leveraging ZIRFs, we develop variable importance measures to estimate transcription factor (TF) networks via the recently developed SCENIC pipeline. Using scRNA-seq data from 20 tissue types, generated by the Tabula Muris consortium, we establish that ZIRFs is a promising alternative to RFs in constructing GRNs for data sets with high levels of dropout.
|