Abstract:
|
Extraordinary amounts of data are being produced in many branches of science as well as people's daily activity. Such data are usually huge in both rows and columns. Modeling such data with limited computation resource has been a challenging problem. We propose an approach to select an informative subset of the data based on optimal design theory, using LASSO regression to perform variable selection and estimation. Compare to existing methods like balanced or weighted sampling, our approach avoids involving sampling error and thus provides more accurate estimation/prediction, also takes much less time.
|