Abstract:
|
Gaussian processes are widely used models in a variety of fields such as statistics and machine learning. To achieve computational feasibility for large datasets, a popular approach is the Vecchia approximation, which is an ordered conditional approximation of the data vector that implies a sparse Cholesky factor of the precision matrix. For this purpose, the observations can be ordered using a maximum-minimum-distance algorithm, and the sparsity is determined by nearest-neighbor conditioning. Both ordering and conditioning are typically carried out based on Euclidean distance of the corresponding inputs. Here, we propose instead to use a correlation-based distance metric, resulting in a general class of correlation-based Vecchia approximations. Our approach can greatly improve the approximation accuracy and is widely applicable including general covariance matrices obtained based on non-Euclidean inputs. Moreover, it can be understood in terms of the Euclidean-based approach on a deformed input space and carried out in quasilinear time in the size of the dataset. We demonstrate the advantages of our method in several simulation scenarios as well as a real data application.
|