Abstract:
|
Simply due to size, in order to analyze a huge data set, it may be necessary or desirable to perform the analysis on selected subdata. There are various methods for selecting subdata from big data, including sampling-based methods and methods that advocate the use of information-based criteria. The information-based criteria relate the problem of ``optimal'' subdata selection to the problem of optimal design of experiments. While there are significant differences between the two problems, the connection makes tools from optimal design available for subdata selection problems. We introduce the basic ideas, demonstrate the success of information-based methods, and discuss some of the shortcomings.
|