Abstract:
|
Real-time analysis of large-scale streaming multivariate data often faces a trade-off between statistical estimation efficiency and computational cost efficiency. For multivariate data streams, one needs to carefully balance the trade-off, especially for sparse and possibly under-determined regression problems, which requires more computational efforts. Data selection enables one to process large-scale streaming data in real-time, so one can fit and update the sparse model in seconds instead of hours. We study the online real-time joint data-dependent sample selection and continuous variable selection for a multi-dimensional spare regression problem for streaming data. We propose a class of online data selection methods that achieve simultaneously sampling and sparse estimation to improve the computational efficiency of the online analysis. The online sparse model estimation involves using coordinate descent algorithms for nonconvex penalized regression, and the real-time data selection adapts optimal design-based sequential online sampling. The performance of the sampling-assisted online sparse estimation method is assessed via simulation studies and real data examples.
|