Abstract:
|
RNA-sequencing (RNA-seq) data is most commonly collected genomic data type and available for a large number of samples. Although RNA-seq is originally designed to measure gene expression levels, the RNA-seq data can also be used to identify genotypes in genes. Here, we used differential nucleotide counts to make genotype calls (AA, AB or BB) at 50,000 single nucleotide polymorphism (SNP) positions simultaneously. Our genotype prediction accuracy is over 97% from a five-fold cross validation test on the Geuvadis data set. We applied the SNP-specific genotype calling method to over 70,000 publicly available RNA-seq samples which we processed on a common pipeline in the recount2 project.
|