Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 35 - Epidemiological Models for Genetic Data, Biomarkers, and Rare Outcomes
Type: Contributed
Date/Time: Sunday, August 7, 2022 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Epidemiology
Abstract #322593
Title: Clustering and Outlier Detection Applied to SARS-CoV-2 Nucleotide Sequences
Author(s): Georg Hahn* and Christoph Lange
Companies: Harvard T.H. Chan School of Public Health and Harvard T.H. Chan School of Public Health
Keywords:
Abstract:

As of January 2022, the GISAID database contains more than one million SARS-CoV-2 genomes, including more than ten thousand nucleotide sequences of the recently discovered omicron variant. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We are interested in investigating whether nucleotide sequences cluster according to their variant (or other characteristics, such as clade), which we assess with an unsupervised cluster analysis applied to the SARS-CoV-2 genomes. To this end, we first assess the similarity of all pairs of genomes using the Jaccard index and principal component analysis. We show that indeed, nucleotide sequences cluster according to certain characteristics such as the strain. More importantly, we show that nucleotide sequences of the omicron variant are outliers in clusters of sequences stemming from variants identified earlier in the pandemic. This finding suggests that outlier detection might be a useful tool to identify emerging variants in real time as the pandemic progresses.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program