Abstract:
|
The National Rivers and Streams Assessment (NRSA) is a probability-based survey conducted by the US Environmental Protection Agency. It provides information on the ecological condition of the rivers and streams in the conterminous USA, and the extent to which they support healthy biological condition. An important problem is the prediction of stream condition at new, unsampled locations. Using random forests (Brieman, 2001) we develop a model to predict the probability that a stream is in good (or conversely poor) biological condition. The model is fit to categorical response data consisting of 1365 NRSA survey sites and their designation as being in good or poor condition according to an aquatic health index. The predictor data consist of 212 landscape features from the EPA's Stream-Catchment Dataset (Hill et al., 2015). The out-of-bag performance of the random forest classifier is evaluated with classification rates, the area under the curve, and other graphical summaries. We find that the random forest model performs remarkably well according to these metrics. We also address issues with variable selection and model stability.
|