Abstract:
|
Deep sequencing has become the most popular tool for transcriptome profiling in cancer research and biomarker studies. Similar to other high through-put profiling technologies such as microarrays, sequencing suffers from systematic non-biological artifacts that arise from variations in experimental handling. A critical first step in sequencing data analysis is to “normalize” sequencing depth, so that the data can be comparable across the samples. A plethora of analytic methods for depth normalization has been proposed, and different normalization methods may lead to different analysis results with no method found to work systematically best. Currently, it is often up to the data analyst to choose a method based on personal preference and convenience. We developed a data-driven and biology-motivated approach to more objectively guide the selection of a depth normalization method for the data at hand. We assessed the performance of this approach using a unique pair of data sets for the same set of tumor samples that were collected at Memorial Sloan Kettering Cancer Center, and applied it to additional data sets from the Cancer Genome Atlas for further demonstration.
|