Abstract:
|
Regression analysis is a widely used tool for prediction in genomic studies. In the last decades, a number of databases of biological networks have become available. These networks document relevant functional information and associations for omics data, where the nodes represent the biological entities such as genes and the edges between them represent associations or interactions. Fully harnessing such information for predictive modeling remains a challenge. Most existing methods incorporate network information by extending group lasso penalty to connected nodes, or penalizing differences in their coefficients. However, these methods may fail in biological networks, where regulations can take effect at levels not represented by the data. We propose a Graph Propagation Regularized regression (GPR), which uses a propagation process simulating spread of activation between nodes, thus implicitly encodes the network structure beyond direct connectivity. GPR has shown superior performance in a range of synthetic scenarios where important genes tend to be clustered in modules. We will demonstrate its application to gene expression data from ovarian cancers.
|