Estimating the number of fish that return to spawn using capture-recapture methods.
Questions about accuracy and precision can only be answered by examining the sampling distribution of the point estimator. In real life only a single sample is drawn from the population (e.g., DFO only does a single capture-recapture experiment), but in order to understand the performance of an estimator, statisticians must look at how the estimate performs when the experiment is repeated many times. This is often done using statistical theory (which is beyond the scope of this course), but a similar process can be done using a simulation that mimics the real life experiment.
Get the sampling bowl, paddles, and beads, and follow these directions for the experiment.
Notice that the estimates vary over the different trials. Why? Carefully review what happened. The population size (the number of green beads initially in the bowl) is fixed and the same in all trials. The number that were tagged (the number of white beads) is also fixed and so was the number of carcasses searched (the number of beads in the paddle). The only random variable in the experiment was the number of tagged fish spotted in the carcass sample (the number of white beads found on the sampling paddle). This number varies because the paddle only selects a sample from the population of carcasses. Even though the proportion of all carcasses with tags is fixed, the sample proportion varies from sample to sample. Finally, because the estimates of escapement were computed using this random variable (it appears in the denominator of the equation), the estimated escapement will also vary from sample to sample.
It is important to step back and look carefully at what you did. The sampling procedure was repeated with everything else held fixed. In real life situations, you would not repeat an experiment many times - what you are doing if examining the theoretical performance of the estimator to see what happens in the long run.
Draw a dot-plot or a histogram of the estimates from your 30 trials. This graph represents part of the sampling distribution of the estimator and can be used to examine the performance of the estimator. [The true sampling distribution would require us to look at all possible samples rather than just 30 trials.]
The term accuracy refers to the long-run average performance of an estimator over all possible samples while the term precision refers to the variation of the estimator over all possible samples. The histogram or dot-plot drawn above shows some of the features.
Because accuracy refers to the long-run average performance, compute the sample mean of the thirty estimates. Because the actual number of beads in the bowl was 4,000, what can you say about the accuracy of the estimator, i.e., does the estimator appear to be 'unbiased' for the true population value?
Similarly, because precision refers to the variability over all possible samples, we need to assess this variability. Compute the standard deviation of the thirty estimates. The technical term for the standard deviation of an estimator is the standard error (s.e.). The empirical rule indicates that about 95% of observations should be within two standard deviations of the mean. Now the standard error is really a standard deviation of the estimates over repeated experiments so we expect that the empiricial rule will also hold here. Compute the average estimate ±2 standard errors using the standard error computed above. Does this interval contain about 95% of the values?
Of course, DFO would like to have some confidence that their estimate is close to the true number of spawning fish, in this case 4,000 fish. The empirical rule again tells us that roughly 95% of the estimates will be within ±2 standard errors of the true value. Examine your data sheet and see what fraction of your estimates are within ±2 standard errors of 4,000.
In general, estimators that are unbiased and have small standard errors are preferred. If an estimator is unbiased, it means that the average value of the estimator (when averaged over repeated experiments) will equal the true population value. If an estimator has a small standard error, it means that one can be fairly certain that the estimate will be close to the true value in the population.
The next module will examine what characteristics of the sampling procedure control these aspects.