Online Program

Return to main conference page

All Times EDT

Friday, June 5
Computational Statistics
Computational Statistics 2
Fri, Jun 5, 11:15 AM - 12:50 PM
TBD
 

Learning Large Genetic Networks Using Gaussian Graphical Models (308403)

*Sujay Datta, University of Akron 
Zhong-Hui Duan, University of Akron 
Haitao Zhao, University of Akron 

Keywords: Gene expression, gene network, pathway database, Gaussian graphical model, graphical LASSO, Monte Carlo sampling

Gaussian graphical model (GGM) is widely applied to learn genetic networks since it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on GGM have been proposed for learning genetic network structures. Since the size of gene variables is typically far more than the size of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. In this study, based on the guidance of specific types of human cancer pathway in Kyoto Encyclopedia of Genes and Genomes (KEGG), we extracted the genes involved in the specific KEGG pathway and the corresponding RNA-seq expression levels in cancer and normal tissues from The Cancer Genome Atlas (TCGA), and constructed two types of small gene expression datasets: normal and cancer gene expression datasets corresponding to gene sets of different types of human cancers. We directly applied graphical lasso to the gene expression datasets of the genes to infer their genetic conditional dependences. However, graphical lasso, although showing good performance in low dimensional datasets, is computational expensive and inefficient or even unable to work directly on genome-wide gene expression datasets. In this project, inspired by the divide-and-conquer strategy as well as the Monte Carlo method, and the idea of graphical lasso, we proposed a simple but efficient method to learn the global genetic networks using graphical lasso and genome-wide RNA-seq datasets. This method utilizes Monte Carlo approach to sample subnetworks; the estimated subnetworks that are learned using graphical lasso are then integrated to approximate the global genetic network. the convergence of this Monte Carlo Gaussian graphical model (MCGGM) was evaluated with a relatively small real dataset of RNA-seq expression levels. The results indicate its strong ability of recovering interactions.