Abstract:
|
The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) distributes numerous surveys annually and conducts the census of agriculture every 5 years. The surveys are extensive and often very costly. In an effort to reduce data collection costs, NASS is currently using multiple machine learning techniques including response propensity modeling (RPM) to estimate the record-level probability of response to a survey. These propensity scores allow the records to be ordered from those likely to respond to those that are unlikely to respond. All records with a propensity score below a predetermined cutoff are flagged as being highly unlikely to respond. These highly unlikely to respond records are candidates for removal from the sample. In this study, the efficacy of removing some or all of the highly unlikely to respond records are examined. Also, an importance measure which incorporates the relative size of the operation, rarity, and state level impact will be used to identify potential bias.
|