Glenys Bishop
University of Adelaide, Australia
Journal of Statistics Education v.6, n.2 (1998)
Copyright (c) 1998 by Glenys Bishop, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Excel; Natural resources science; Proportions; Sampling distribution; Spatial statistics.
This paper outlines one of a series of tutorials developed as part of an introductory statistics course for Agricultural and Natural Resource Sciences students. Here we compare two methods of sampling from an aerial photograph to obtain an estimate of the proportion of a particular type of vegetation. One method, transect sampling, is traditionally used by field ecologists, while the other is simple random sampling in a plane. Preparation details and possible extensions to the tutorial are described.
1 Faced with teaching a compulsory introductory statistics subject to two large groups of Agricultural and Natural Resource Sciences students, we searched for motivating examples to be used in both lectures and tutorials. This is the first of several papers describing the tutorials we have developed. Our target audience consisted of first year Agricultural and Natural Resource Sciences students, but some of the examples are more widely applicable.
2 Students participate in eleven tutorials throughout the semester. In one tutorial we use the fruitfly data of Hanley and Shapiro (1994) to teach hypothesis testing, while in another we use the conditional probability examples of Rossman and Short (1995). These examples have been well received by the students.
3 In this and subsequent papers, we shall describe some of the tutorials that we have developed. The tutorials are conducted in computer laboratories, but access to a computer is not essential for the exercise discussed here.
4 We use Microsoft Excel for calculations, graphics, and statistical functions. However, any statistical package with a random number generator, or just a table of random digits, will suffice for this exercise.
5 The primary goal of this tutorial is to enable students to compare transect sampling, traditionally used by ecologists, with simple random sampling. We also want to prepare students for the idea of the sampling distribution of a proportion by recording the values obtained by all the students.
6 A more detailed objective of this tutorial is to teach students how to select a sample -- in this case, a sample of points from an aerial photograph. To do this, they must understand the concept of randomness and how to use either a random number generator or a table of random digits. In this tutorial we use random number generation from the Excel analysis tools.
7 Rangeland managers and field ecologists often need to estimate the area covered by a particular vegetation type within a region. Although remote sensing by satellite imaging has made this task easier, it is often necessary to employ simpler methods. An aerial photograph provided by courtesy of the Resource Information Group, Department of Environment and Natural Resources, South Australia, illustrates the types of information that may be sought.
8 The photograph in Figure 1 shows the Sandy Creek Conservation Park and surrounding areas about 50 kilometres north of Adelaide. The conservation park is in the bottom left quadrat of the picture. The surrounding area includes roads, a river, cleared and uncleared farmland, plantations, dams, and a few houses. The whole region is fairly flat.
Figure 1. Aerial Photograph of the Sandy Creek Conservation Park.
9 Transect sampling when non-moving objects are to be counted involves choosing a line or series of lines along which the counts are to take place. The transects may be chosen randomly, or the first may be chosen randomly and subsequent transects systematically. They may be parallel, perpendicular, or at some other angle suitable for the situation.
10 An alternative method is point sampling. Imagine a grid placed over the area such that the grid lines are as close together as is practical. For instance, they may be a metre or half metre apart in a conservation park. Coordinate pairs are randomly generated, and the points represented by those pairs are examined for the presence or absence of the objects of interest.
11 To assist others in running this tutorial we have prepared details about the materials required and also guidelines for tutors. They should be read in conjunction with the student notes shown in the Appendix at the end of this paper.
12 Students should be able to easily distinguish undisturbed bushland in the photograph. We have experimented with photocopies and found the photographic button on a photocopying machine takes a clear copy of a photograph. Copies of a photocopy are not clear enough to be useful. In each tutorial class, we have two or three original photographs, laminated for protection, so that students can view the fine details.
13 So that students can find randomly generated points, we have glued a measurement scale to each edge of the photograph before copying it. We obtained these scales by photocopying a ruler marked in millimetres onto white paper.
14 In the week before the tutorial, students should be advised to familiarise themselves with the problem as described in the first part of the student notes (see the Appendix). They should also be told to bring a ruler and a highlighter pen to the class.
15 The tutorial is designed to last for 50 minutes, and it is important to keep things moving as the main aim of the exercise can only be met when all students have collected two different samples and calculated estimates.
16 The tutorial can be divided into three parts: definition, execution, and conclusions. First, divide the students into discussion groups of about four to define the regions they will regard as undisturbed bushland. Allow five minutes for this discussion, and then hold a five-minute forum to establish a class definition of undisturbed bushland. This definition must be operational; that is, students must be able to decide whether any point on the graph is undisturbed bushland or not.
17 Two points should emerge from the forum. To compare sampling methods, the same definition of bushland must be used for both methods. Furthermore, if students' estimates are to be directly comparable, they must all be using the same definition.
18 The tutorial now moves into the execution stage. Ensure that students clearly highlight the undisturbed bushland areas on their photocopies of the photograph. (In the past, we have found that the means of all students' estimates for the two methods differ substantially. This is probably because many students classify isolated clumps of trees as bushland in simple random sampling, but not in transect sampling.) Warn students that highlighting and sample selection should take no more than 20 minutes.
19 The last stage of the tutorial involves discussion about what can be learnt from the data and reaching conclusions. Collect each student's estimates on the board so that everyone can see them. The estimates should be arranged in two columns: transect and simple random sample. Get students to discuss the precision and usefulness of the two methods. Discussion points can include
20 If more time is available there are several issues that can be developed from this tutorial. The most obvious is the sample size effect for the simple random sample. We have used 20 points because that number was thought to be achievable in the allocated time. As a short cut, you could ask students to use the first 10 points to obtain an estimate, and then all 20 points.
21 More advanced students could be asked to examine other sampling schemes such as systematic sampling. In this method, r parallel transects at equal intervals are examined; the first transect is selected randomly in the range 1 to (180/r). For this example, if ten transects are to be used, the first is chosen by generating a random number between 1 and 18 (180/10). Subsequent transects are 18 mm below the previous one.
22 Points can also be chosen systematically instead of randomly. They can be selected at regular or irregular intervals along parallel transects. The aim is to give representative coverage of the area, while avoiding following features of the landscape such as streams, fencelines, and ridges. Methods that simulate what an ecologist might do on foot in a park, when no aerial photograph exists, could be discussed. Buckland, Anderson, Burnham, and Laake (1993) discuss some of the practicalities of these methods when introducing principles of distance sampling.
23 Transect sampling is also used in microscopic work. For example, physiologists count certain cell types, and engineers examine grain structure in metals. Commonly, a grid available on the microscope is used to take systematic samples. Photographs taken under the microscope could be used in place of the aerial photograph or as an extension to illustrate other uses of transect sampling.
24 In subsequent lectures, when introducing the sampling distribution of a proportion, we have found it useful to refer to the variety of estimates obtained by students. The number of points, from a simple random sample of 20, that are in undisturbed bushland may be used as an example of a binomial variable. However, because points on the transect are not independent of one another, the number of points out of 400 in undisturbed bushland is not a binomial variable.
25 As an assignment question, we ask the students to calculate a 95% confidence interval for the proportion, p, of the whole area that is in undisturbed bushland using their own simple random sample estimate. We also ask them to explain why the formula for the 95% confidence interval is inappropriate for the transect method. Most of them can see that the points are not independent.
This work was conducted with aid of a University of Adelaide Teaching Development Grant. I wish to thank two anonymous referees for their very helpful suggestions.
A common problem in rangeland management is the estimation of areas covered by a particular vegetation type in a region. In this workshop, we are going to compare two different rangeland surveying techniques, both of which involve sampling.
This method is suitable for flat, low-lying scrub with clear delineations among vegetation types. Walk in a straight line from a random starting point and either count paces or use a tape measure to find the proportion of the whole traverse that intersects the vegetation type of interest.
Since we have a good view of the region, we may decide to choose a representative or best path to take for our estimate, but this leaves us open to the possibility of biasing our estimate towards a preconceived idea of the area. On the other hand, choosing a line at random is only the best method when the vegetation type of interest occurs in random clumps, clearly a rare property in practice.
A common compromise is to take two lines, one at right angles to the other. In this way we are likely to pick up any clumping.
Given the dimensions of the region, randomly choose a subset of point coordinates to sample. Walk to each point and decide whether it lies in the vegetation type of interest. Find the number of points in the vegetation as a proportion of all points examined. In each case, this proportion is an estimate of the proportion of the total area of the region covered. We can do something similar in the laboratory by using an aerial photograph.
You have been given a photograph (scale 1:16000) of the Sandy Creek Conservation Park and surrounds near Gawler taken in March 1979. The aim is to determine how much of the whole region is "undisturbed bushland" as this has implications for the park fauna.
Imagine a grid of one-millimetre squares drawn on the photograph. Each of the intersections is a point either lying in undisturbed bushland or not. Since this grid is very fine, the proportion of the points which lie in undisturbed bushland will be close to the corresponding proportion of the area. We will sample 20 points at random by selecting random coordinates from the top and side scales 20 times.
Suppose your first coordinate pair is Top = 25 mm, Side = 150 mm, and the point lies in farmland. Then your table would look like this:
Top Side Bush 25 150 0
Your tutor will write the class results for the two methods on the board. Get into groups and perform the following tasks.
Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. (1993), Distance Sampling: Estimating Abundance of Biological Populations, London: Chapman & Hall.
Hanley, J. A., and Shapiro, S. H. (1994), "Sexual Activity and the Lifespan of Male Fruitflies: A Dataset That Gets Attention," Journal of Statistics Education [Online], 2(1). (http://jse.amstat.org/v2n1/datasets.hanley.html)
Rossman, A. J., and Short, T. H. (1995), "Conditional Probability and Educational Reform: Are They Compatible?," Journal of Statistics Education [Online], 3(2). (http://jse.amstat.org/v3n2/rossman.html)
Glenys Bishop
Statistics Department
University of Adelaide
Australia 5005