Abstract:
|
Machine learning and sequencing are two rapidly evolving fields: the former enables efficient processing of large data and accurate predictions; the latter provides information across the genome and allows researchers to study multiple samples simultaneously in a cost-effective manner. Here we use an example of thyroid nodule evaluation to illustrate how to apply advanced machine learning methods on sequencing data to develop clinically meaningful diagnostic tools. We built multi-layered classifiers using RNA sequencing derived from Fine Needle Aspirate (FNA) samples; they are minimally invasive, providing patient a safe and cost-effective diagnostic solution. It leverages deep domain knowledge and multiple types of genomic alterations such as differential gene expression, LOH, mitochondria content, fusions, and nucleotide variants to effectively differentiate benign from malignant nodules. In addition, we describe analytical solutions to challenges commonly faced in clinical settings such as small sample size, heterogeneity in patient populations, difficult subtypes, low sample quality and technical variability due to reagent/equipment induced batch effect.
|