Abstract:
|
Tumor cells' genomes have many lesions relative to normal cells from the same individual. Molecular oncology researchers wish to determine which genomic lesions are relevant to cancer biology by identifying specific genes with a significant overabundance of lesions among a cohort of subjects. The genomic random interval (GRIN) method defines a null model for the locations of each subject's lesions to compute a null distribution for the number of subjects with a genomic lesion affecting each gene. For each gene, a p-value is computed by comparing the observed number of subjects with a lesion in that gene to this null distribution. In theory, the null distribution is determined by a straightforward convolution calculation. However, in practice, calculations are statistically and computationally much more complex due to the need to adjust for technological limitations that preclude observation of lesions in some portions of the genome. Here, we describe computationally efficient strategies to perform these critical adjustments. The adjustments can change p-values by orders of magnitude and the computational strategies can reduce computing time by orders of magnitude.
|