Abstract:
|
This talk presents a novel method called support points, which tackles the problem of optimal subsampling of big data that is independent of the statistical method used for modeling the data. This method has important applications to many practical problems in statistics and engineering, particularly when the available data is plentiful and high-dimensional, but the processing of such data is expensive due to computation or storage costs. We also propose an extension of the method called Projected Support Points to deal with high dimensional data, which ensures that the data is well-reduced on low-dimensional projections of the data space.
|