Abstract:
|
Recent advances in experimental techniques and scientific instruments have enabled the collection of biological, biogeochemical, and imaging data of the ocean on a global scale. The Simons CMAP, a currently developed large-scale open-access marine database, hosts a multitude of such marine datasets, including remote-sensing satellite observations, large-scale integrated in-situ biogeochemical cruise measurements, amplicon sequencing data, and complex synthetic ocean simulation data. To facilitate easy access to these rich data sets for statisticians and data scientists, we have developed cmap4r, an R package that enables downloading, analyzing, and visualizing datasets from the Simons CMAP in a fast and structured manner. Integrated analysis of marine data is challenging due to several factors, including the presence of outliers, missing entries, different spatial and temporal resolutions, spatiotemporal dependencies, high dimensionality, and amplicon sequencing data, the absence of absolute species abundance measurements due to experimental limitations. This presents a unique opportunity for both the development and the application of novel statistical methods for marine data.
|