Abstract:
|
RNA-seq is a sequencing technology to determine gene expression levels (given in counts of RNA fragments) for a large number of genes in biological samples. RNA-seq can be useful to find genes which differentiate between two samples (such as tumor tissue vs normal tissue) and may identify clinically significant genes. Current statistical methods for RNA-seq test each gene independently, ignoring the importance of gene clusters, which has well established in the literature. To correct this weakness, we use Ingenuity Pathway Analysis, a literature-driven database, to sort the human genome into 200 gene networks, each of which is tested individually. A multivariate negative binomial regression model is developed and applied to analyze each gene network, drawing inspiration from statistical methods in longitudinal count data. This approach is proven to be effective through realistic simulations based on cancer data and a real data analysis from the Cancer Genome Atlas (TCGA). Results indicate that including information about gene-gene relationships significantly improves power and enhances the biological interpretability of gene lists generated from such an analysis.
|