Abstract:
|
A coreset is an effective summary of the original partially redundant data such that the solution of a problem over the coreset yields results similar to that obtained with the original dataset. Although the general notion of a coreset is clear, there are many different approaches to select a coreset based on the type of problem being considered. A recent approach to formalize coreset construction is based on the notion of an “accurate coreset,” which selects a coreset in such a way that the solution of a statistical problem (central tendency measurement, linear model estimation, dimension reduction, loss minimization, etc.) over the accurate coreset is exactly same as the solution obtained from the full set. Here, we propose such a coreset construction for the spatial data. The approach is simple in that it first reconstructs the original process using a low-dimensional Gaussian process, and then applies the theory of accurate coreset to efficiently select a spatial coreset. We demonstrate the methodology on point (environmental) and areal (federal survey) data and discuss the similarity to other recent approaches for optimal reduction of spatial data.
|