Abstract:
|
With rapid advances in information technology, massive datasets are collected in all fields of science, such as engineering, biology, chemistry and social science. Useful information is extracted from these data through statistical learning or model fitting. As both sample size and dimensionality diverge, conventional methods may face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores has been proposed to select rows of a design matrix such that the corresponding subsamples can be used as a surrogate of the full data in linear regression. Although the estimator based on leverage sampling has been shown to produce a good approximation to the estimator based on full data, it cannot reduce the dimensionality. Motivated by the leverage sampling, we propose a weighted leverage variable screening method. The predictors selected using proposed method can consistently include true predictors not only for linear models but also for complicated general index models. Extensive empirical studies show that the weighted leverage screening method is highly computationally efficient and accurate.
|