Abstract:
|
In this poster, we work on the framework of Chatterjee and Bose (2005) of generalized bootstrap technique for estimators obtained by solving estimating equations. We consider the case that the sample size n is extremely large and the estimate ? ?_n is not available or time-consuming to obtain it. Typically, the dimension p will also be very large. Our approach in tackling this big data estimation problem is A-optimal subsampling, that is, we seek the A-optimal sampling distribution on the data points and use it to take a subsample of size r as a surrogate of the whole sample. We approximate the estimate ? ?_n by the subsampling generalized bootstrap estimate ? ?_r^* which solves the corresponding estimating equations. We show that the A-optimal weights is more effective than generalized bootstrap weights suggested by Chatterjee and Bose, and the frequently used non-uniform sampling distribution the leverage scores in drawing important information. This is demonstrated by simulations of a Cox proportional hazard regression model which shows that A-optimal gives the minimum mean square error of the estimate.
|