Abstract:
|
The proportion of RNA isoforms (splice variants) expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of isoform proportions can help determine biological mechanisms, however these often require a per-gene investigation of splicing patterns. Leveraging large public datasets produced by genomic consortia as a reference, we can compare splicing patterns in a dataset of interest with those of a reference panel in which samples are divided into distinct groups (tissue of origin, disease status, etc). We employ a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in datasets to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the reference panel’s sample group membership and identify sets of genes that relate to the reference panel similarly. Using the Genotype-Tissue Expression (GTEx) project as a reference dataset, we evaluate our model on simulated and real RNA-seq datasets to determine tissue type classifications of genes from an independent study.
|