Online Program

Return to main conference page
Friday, February 21
Fri, Feb 21, 5:15 PM - 6:30 PM
Regency EF
Poster Session 2 and Refreshments

Detecting Data Falsification in Surveys (304017)

*Dhafer Malouche, Yale University 

Keywords: Data, Survey, Quality, Falsification, Duplicates, R, Shiny, Application, Programming, Analysis, Science

Duplications are one of the most concerns in Survey research, and it’s the most observed forms of the falsifications when we perform surveys. Recently Kuriakose and Robbins (2015) have introduced an index called p-match as a percent match index. It computes duplications at the level of each observation and measures the number of duplications when recording this observation. The high values of this index show significant concerns in the collected data. In this paper, we had used this index to measure pairwise duplications. Since this index measures the similarity between any pair of observations, we had then decided to use many other similarity indexes. We also extend the definition of p-match on the disjunctive version of the data. We had also introduced the Expectation-Maximization algorithm to detect the group of pairs of observation that may have a severe issue of duplication. We had chosen to implement these new procedures and indexes in an interactive web interface using the R package Shiny. The result is a Web application that can be used from any web browser without installing any statistical software and without any statistical knowledge.