Abstract:
|
Whether due to the size of datasets or computational challenges, there is a vast amount of literature on using only some of the data (subdata) for estimation or prediction. This raises the question how subdata should be selected from the entire dataset (full data). One possibility is to select the subdata completely at random from the full data, but this is typically not the best method. The literature contains various suggestions for better alternatives. After introducing some of these alternatives, we will discuss extensions, weaknesses, and new directions.
|