Online Program Home
My Program

Abstract Details

Activity Number: 531 - SPEED: Statistical Computing: Methods, Implementation, and Application, Part 2
Type: Contributed
Date/Time: Wednesday, July 31, 2019 : 11:35 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #307942
Title: Spatial DNA: Measuring Similarity of Geolocation Data Sets with Applications to Forensics
Author(s): Christopher Galbraith* and Padhraic Smyth
Companies: University of California, Irvine and University of California, Irvine
Keywords: spatial point processes; randomization; kernel density estimation; forensics; cybersecurity

Datasets consisting of geolocated events provide rich spatial characterizations of human behavior. Individuals tend to be self-consistent over time while generating such events, visiting the same locations such as home, the office, or the gym. In this paper we develop an approach to quantify similarity between sets of spatial events, drawing inspiration from the forensic evaluation of DNA evidence. A randomization-based technique is applied in which locations are sampled from conditional distributions of spatial locations (constructed via mixtures of kernel density estimates with weights derived from discrete locations). Score functions based on the distance between groups of events are then computed and used to construct coincidental match probabilities. We illustrate the approach with a large geolocation data set collected from Twitter users. Results are compared to computing the log-likelihood of one set of spatial events under a mixture-KDE from another to assess similarity. Our experimental results indicate that the proposed method can accurately assess the similarity between sets of geolocations, with potential applications in forensic and cybersecurity settings.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program