Abstract:
|
Forensic analyses are often concerned with identifying the spatial source of biological residue. This procedure, known as geolocation, is conventionally guided by expert knowledge of the residue in question. Purely data-driven methods remain rare, a major impediment being the lack of a data source rich enough in diversity and scope to permit appropriate theoretical abstraction. With recent advances in high-throughput sequencing technologies, however, dust collected from nearly any object can be shown to harbor DNA fragments from thousands of bacteria and fungi species. This microbial community, or microbiome, may be informative of the source of the dust, but its high-dimensional, complex dependence structure renders it difficult to model with standard statistical tools. Here we show that training collections of deep neural network classifiers on random Voronoi partitions of a spatial domain yields remarkably accurate geolocation predictions. When applied to the microbiomes of over 1,300 dust samples collected across the U.S., more than half of predictions produced by this model fall within 90 kilometers of their origin, a 60% reduction in error from competing geolocation methods.
|