The integrated analysis of gene expression and biological pathway data is crucial to the understanding of systems biology. However, identifying where differentially expressed genes control the behavior of reactions or signals in pathways is a complex challenge. Statistical tests performed to assess representation often suffer from sample size bias, in that small sample sizes produce large p-values regardless of the percentage of features represented within a pathway. Furthermore, biological pathways are often defined differently in different pathway databases.
Here we propose a representation of pathways that reflects the fact that few natural boundaries exist in any series of reactions. A novel method called PathWeAveRs is introduced to test whether differentially expressed features (e.g. genes) are distributed randomly throughout pathways. The application of PathWeAveRs to two gene expression microarray datasets reveals significant subpathways of reactions that are relevant in the context of the experiments from which the data were generated. The results indicate PathWeAveRs produces biologically meaningful, statistically significant pathways.