Online Program Home
  My Program

Abstract Details

Activity Number: 76 - Sample Design
Type: Contributed
Date/Time: Sunday, July 30, 2017 : 4:00 PM to 5:50 PM
Sponsor: Survey Research Methods Section
Abstract #323947 View Presentation
Title: Identifying Out of Business Records on the NASS List Frame Using Boosted Regression Trees
Author(s): Gavin Corral* and Andrew Dau
Companies: USDA NASS and USDA NASS
Keywords: Survey ; Boosted Tree ; Machine Learning ; List Frame
Abstract:

The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) produces hundreds of publications annually. The research conducted at NASS is based on survey data, which is compiled in the NASS list frame. Therefore, it is imperative that the NASS list frame is complete and up-to-date in order to produce valid and accurate estimates for agriculture. For this reason, NASS is constantly updating the list frame by adding new farms. Conversely, farms also go out-of-business, and these farms need to be removed from the list frame for it to stay current. In this paper, we examine the efficacy of boosted trees to identify out-of-business records prior to data collection. We found that boosted regression trees outperformed logistic regression and random forests. Boosted regression trees were shown to have the lowest misclassification rate and highest R2 .


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association