Gavin R. Corral
United States Department of Agriculture, National Agricultural Statistics Service
Andrew J. Dau
United States Department of Agriculture, National Agricultural Statistics Service
Jodie M. Sprague
United States Department of Agriculture, National Agricultural Statistics Service
76 – Sample Design
Identifying Out of Business Records on the NASS List Frame Using Boosted Regression Trees
Gavin R. Corral
United States Department of Agriculture, National Agricultural Statistics Service
Andrew J. Dau
United States Department of Agriculture, National Agricultural Statistics Service
Jodie M. Sprague
United States Department of Agriculture, National Agricultural Statistics Service
The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) produces hundreds of publications annually. The research conducted at NASS is based on survey data, which is compiled in the NASS list frame. Therefore, it is imperative that the NASS list frame is complete and up-to-date in order to produce valid and accurate estimates for agriculture. For this reason, NASS is constantly updating the list frame by adding new farms. Conversely, farms also go out-of-business, and these farms need to be removed from the list frame for it to stay current. In this paper, we examine the efficacy of boosted trees to identify out-of-business records prior to data collection. We found that boosted regression trees outperformed logistic regression and random forests. Boosted regression trees were shown to have the lowest misclassification rate and highest R2 .