Thursday, October 19

Thu, Oct 19, 2:45 PM - 3:50 PM
Aventine Ballroom E

Speed Session 1

Defining Transcriptional Activity States by Leveraging Massive, Public RNAseq Data Sets (303936)

*Ayshwarya Subramanian, Harvard T.H. Chan School of Public Health/ Dana-Farber Cancer Institute

Keywords: Genomics, RNAseq, Bayesian, Mixture Models, Probabilistic program

With the drop in nucleotide sequencing costs, several researchers are studying transcriptional activities using RNA sequencing (RNAseq) experiments. However, our current understanding of technical noise, bias and measurement error in such data are only preliminary. The availability of large numbers of public RNAseq data enable the possibility of developing unifying models for baseline transcriptomic activity across different tissue types that are robust across sequencing technologies. In this work, I leverage data from a large publicly available study, the Genotype-Tissue Expression project (GTEx) to model transcriptional activity using hierarchical Bayesian mixture models with a goal to better distinguish noise from biological signal in a uniform manner. I will describe the excitement and challenges of doing genomic data science; from acquiring, cleaning and wrangling transcriptomic data to reasoning with statistical models to draw meaningful conclusions. I will also describe how lack of adequate data-curation prevents the larger genomic data science community from utilizing public data repositories with massive numbers of smaller studies.

Online Program

Defining Transcriptional Activity States by Leveraging Massive, Public RNAseq Data Sets (303936)

American Statistical Association

Share