Abstract:
|
Mixed Data, comprised, for example, of many continuous, discrete, and count-valued variables, is prevalent in many Big Data domains and especially our particular motivation - high-throughput integrative genomics. Recently, new Mixed Markov Random Field distributions, or graphical models, were proposed that assume each node-conditional distribution arises from a different exponential family model. These yield joint densities which can directly parameterize dependencies over mixed variables. Fitting these models to perform mixed graph selection entails estimating penalized generalized linear models with mixed covariates. This task, however, poses many challenges due to differences in the scale of mixed covariates and intrinsic preferences for selecting continuous variables over discrete variables. We study these challenges theoretically and empirically and propose a new iterative block estimation strategy. Our methods are studied in simulations and used to estimate a gene regulatory network that integrates methylation, small RNA expression, and gene expression data to fully understand regulatory relationships in ovarian cancer.
|