Abstract:
|
Recent work by Lovell et al. (2015) and others have established the need to develop and interpret RNAseq data in the framework of compositional data. Compositional data involve measurements of different components (e.g., genes) where information about total amounts is lost, and inference relies only on the relative amounts. Many biologic technologies share this feature of measurements, qRT-PCR and Affymetrix GeneChip® microarray expression being two common examples. We formulate targeted sequencing assays (i.e., custom mRNA libraries) as this type of measurement system. Further, we demonstrate that relative abundance provides a more meaningful formulation than typical raw counts, and even counts-per-million (CPM) measurement scales. We demonstrate this approach in a set of technical replicates by comparing measures of absolute abundance correlations to their compositional counterparts, namely centered logratios of relative abundance and biplot graphical displays. We extend the current approaches to include statistical diagnostic tests to identify sample outliers based on proportionality of sequencing read depth within and between sequencing runs.
|