Online Program Home
My Program

Abstract Details

Activity Number: 29 - SPEED: An Ensemble of Advances in Genomics and Genetics
Type: Contributed
Date/Time: Sunday, July 29, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #330138
Title: Benford's Law Based Outliers Detection for Population Stratification in Genotype Data
Author(s): Yuan Yuan* and Nedret Billor and Asuman Seda Turkmen
Companies: Auburn University and Auburn University and The Ohio State University
Keywords: Benford's Law; GWAS; outliers; PCA; population structure; Case-control studies

The issue of population stratification remains a challenging problem in genome-wide association studies. The sample of genome data is often stratified and contaminated by outliers. Benford's law, also called Newcomb-Benford's law and first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. Benford's law has been applied to fraud detection for different types of datasets (i.e., tax fraud, election survey, etc.). When the dataset is free from error or fabrication, the first digit should follow the Benford distribution. When a dataset is artificially modified or is contaminated by outliers, the digits distribution would not follow the Benford distribution exactly. This study proposes an outlier detection method for the genotype data by using Benford's law. We test the accuracy of the new method by applying it to datasets with genuine or simulated outliers. We also compare the performance of Benford's law based outlier detection against other existing approaches (e.g., PCA methods). We believe that the new approach will be a promising contribution which helps to detect population stratification more accurately.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program