I. Sampling From an Aerial Photograph

Glenys Bishop

University of Adelaide, Australia

Journal of Statistics Education v.6, n.2 (1998)

Copyright (c) 1998 by Glenys Bishop, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

**Key Words**: Excel; Natural resources science; Proportions;
Sampling distribution; Spatial statistics.

This paper outlines one of a series of tutorials developed as part of an introductory statistics course for Agricultural and Natural Resource Sciences students. Here we compare two methods of sampling from an aerial photograph to obtain an estimate of the proportion of a particular type of vegetation. One method, transect sampling, is traditionally used by field ecologists, while the other is simple random sampling in a plane. Preparation details and possible extensions to the tutorial are described.

1 Faced with teaching a compulsory introductory statistics subject to two large groups of Agricultural and Natural Resource Sciences students, we searched for motivating examples to be used in both lectures and tutorials. This is the first of several papers describing the tutorials we have developed. Our target audience consisted of first year Agricultural and Natural Resource Sciences students, but some of the examples are more widely applicable.

2 Students participate in eleven tutorials throughout the semester. In one tutorial we use the fruitfly data of Hanley and Shapiro (1994) to teach hypothesis testing, while in another we use the conditional probability examples of Rossman and Short (1995). These examples have been well received by the students.

3 In this and subsequent papers, we shall describe some of the tutorials that we have developed. The tutorials are conducted in computer laboratories, but access to a computer is not essential for the exercise discussed here.

4 We use Microsoft Excel for calculations, graphics, and statistical functions. However, any statistical package with a random number generator, or just a table of random digits, will suffice for this exercise.

5 The primary goal of this tutorial is to enable students to compare transect sampling, traditionally used by ecologists, with simple random sampling. We also want to prepare students for the idea of the sampling distribution of a proportion by recording the values obtained by all the students.

6 A more detailed objective of this tutorial is to teach students how to select a sample -- in this case, a sample of points from an aerial photograph. To do this, they must understand the concept of randomness and how to use either a random number generator or a table of random digits. In this tutorial we use random number generation from the Excel analysis tools.

7 Rangeland managers and field ecologists often need to estimate the area covered by a particular vegetation type within a region. Although remote sensing by satellite imaging has made this task easier, it is often necessary to employ simpler methods. An aerial photograph provided by courtesy of the Resource Information Group, Department of Environment and Natural Resources, South Australia, illustrates the types of information that may be sought.

8 The photograph in Figure 1 shows the Sandy Creek Conservation Park and surrounding areas about 50 kilometres north of Adelaide. The conservation park is in the bottom left quadrat of the picture. The surrounding area includes roads, a river, cleared and uncleared farmland, plantations, dams, and a few houses. The whole region is fairly flat.

Figure 1 (540K jpg)

Figure 1. Aerial Photograph of the Sandy Creek Conservation Park.

9 Transect sampling when non-moving objects are to be counted involves choosing a line or series of lines along which the counts are to take place. The transects may be chosen randomly, or the first may be chosen randomly and subsequent transects systematically. They may be parallel, perpendicular, or at some other angle suitable for the situation.

10 An alternative method is point sampling. Imagine a grid placed over the area such that the grid lines are as close together as is practical. For instance, they may be a metre or half metre apart in a conservation park. Coordinate pairs are randomly generated, and the points represented by those pairs are examined for the presence or absence of the objects of interest.

11 To assist others in running this tutorial we have prepared details about the materials required and also guidelines for tutors. They should be read in conjunction with the student notes shown in the Appendix at the end of this paper.

12 Students should be able to easily distinguish undisturbed bushland in the photograph. We have experimented with photocopies and found the photographic button on a photocopying machine takes a clear copy of a photograph. Copies of a photocopy are not clear enough to be useful. In each tutorial class, we have two or three original photographs, laminated for protection, so that students can view the fine details.

13 So that students can find randomly generated points, we have glued a measurement scale to each edge of the photograph before copying it. We obtained these scales by photocopying a ruler marked in millimetres onto white paper.

14 In the week before the tutorial, students should be advised to familiarise themselves with the problem as described in the first part of the student notes (see the Appendix). They should also be told to bring a ruler and a highlighter pen to the class.

15 The tutorial is designed to last for 50 minutes, and it is important to keep things moving as the main aim of the exercise can only be met when all students have collected two different samples and calculated estimates.

16 The tutorial can be divided into three parts: definition, execution, and conclusions. First, divide the students into discussion groups of about four to define the regions they will regard as undisturbed bushland. Allow five minutes for this discussion, and then hold a five-minute forum to establish a class definition of undisturbed bushland. This definition must be operational; that is, students must be able to decide whether any point on the graph is undisturbed bushland or not.

17 Two points should emerge from the forum. To compare sampling methods, the same definition of bushland must be used for both methods. Furthermore, if students' estimates are to be directly comparable, they must all be using the same definition.

18 The tutorial now moves into the execution stage. Ensure that students clearly highlight the undisturbed bushland areas on their photocopies of the photograph. (In the past, we have found that the means of all students' estimates for the two methods differ substantially. This is probably because many students classify isolated clumps of trees as bushland in simple random sampling, but not in transect sampling.) Warn students that highlighting and sample selection should take no more than 20 minutes.

19 The last stage of the tutorial involves discussion about what can be learnt from the data and reaching conclusions. Collect each student's estimates on the board so that everyone can see them. The estimates should be arranged in two columns: transect and simple random sample. Get students to discuss the precision and usefulness of the two methods. Discussion points can include

- A comparison of five number summaries for the two
methods to see which estimate is less variable or more
precise,
- An examination of the fairness of the above comparison,
taking into consideration the amount of effort required
to obtain each estimate in class and how this might
change in the field,
- The usefulness of each method in terms of precision --
that is, the range of class estimates should be small
enough to give us a reasonable idea of the proportion
of native bushland, and
- Consideration of improving precision by increasing the
number of points or transects sampled.

20 If more time is available there are several issues that can be developed from this tutorial. The most obvious is the sample size effect for the simple random sample. We have used 20 points because that number was thought to be achievable in the allocated time. As a short cut, you could ask students to use the first 10 points to obtain an estimate, and then all 20 points.

21 More advanced students could be asked to examine other
sampling schemes such as systematic sampling. In this
method, *r* parallel transects at equal intervals are
examined; the first transect is selected randomly in the
range 1 to (180/*r*). For this example, if ten transects are
to be used, the first is chosen by generating a random
number between 1 and 18 (180/10). Subsequent transects are
18 mm below the previous one.

22 Points can also be chosen systematically instead of randomly. They can be selected at regular or irregular intervals along parallel transects. The aim is to give representative coverage of the area, while avoiding following features of the landscape such as streams, fencelines, and ridges. Methods that simulate what an ecologist might do on foot in a park, when no aerial photograph exists, could be discussed. Buckland, Anderson, Burnham, and Laake (1993) discuss some of the practicalities of these methods when introducing principles of distance sampling.

23 Transect sampling is also used in microscopic work. For example, physiologists count certain cell types, and engineers examine grain structure in metals. Commonly, a grid available on the microscope is used to take systematic samples. Photographs taken under the microscope could be used in place of the aerial photograph or as an extension to illustrate other uses of transect sampling.

24 In subsequent lectures, when introducing the sampling distribution of a proportion, we have found it useful to refer to the variety of estimates obtained by students. The number of points, from a simple random sample of 20, that are in undisturbed bushland may be used as an example of a binomial variable. However, because points on the transect are not independent of one another, the number of points out of 400 in undisturbed bushland is not a binomial variable.

25 As an assignment question, we ask the students to calculate
a 95% confidence interval for the proportion, *p*, of the
whole area that is in undisturbed bushland using their own
simple random sample estimate. We also ask them to explain
why the formula for the 95% confidence interval is
inappropriate for the transect method. Most of them can see
that the points are not independent.

This work was conducted with aid of a University of Adelaide Teaching Development Grant. I wish to thank two anonymous referees for their very helpful suggestions.

- Some ways of comparing different sampling methods,
- That estimates of the same thing, based on different
samples, will vary, and
- How to select a random sample using random numbers.

A common problem in rangeland management is the estimation of areas covered by a particular vegetation type in a region. In this workshop, we are going to compare two different rangeland surveying techniques, both of which involve sampling.

This method is suitable for flat, low-lying scrub with clear delineations among vegetation types. Walk in a straight line from a random starting point and either count paces or use a tape measure to find the proportion of the whole traverse that intersects the vegetation type of interest.

Since we have a good view of the region, we may decide to
choose a *representative* or *best* path to take for our
estimate, but this leaves us open to the possibility of
biasing our estimate towards a preconceived idea of the
area. On the other hand, choosing a line at random is only
the best method when the vegetation type of interest occurs
in random clumps, clearly a rare property in practice.

A common compromise is to take two lines, one at right angles to the other. In this way we are likely to pick up any clumping.

Given the dimensions of the region, randomly choose a subset of point coordinates to sample. Walk to each point and decide whether it lies in the vegetation type of interest. Find the number of points in the vegetation as a proportion of all points examined. In each case, this proportion is an estimate of the proportion of the total area of the region covered. We can do something similar in the laboratory by using an aerial photograph.

You have been given a photograph (scale 1:16000) of the Sandy Creek Conservation Park and surrounds near Gawler taken in March 1979. The aim is to determine how much of the whole region is "undisturbed bushland" as this has implications for the park fauna.

- In discussion groups, decide which of the following you
will regard as undisturbed bushland. The original
photograph is with the tutor for clearer inspection.
- River banks
- Trees lining roads
- Paddocks with substantial tree cover
- Plantations

- Still in groups, discuss reasons why it is important to
define the undisturbed bushland regions before you
start.
- Share your ideas with the whole class.
- Once the class has settled on a definition, use a
highlighter pen to mark boundaries around all areas of
undisturbed bushland. The highlighter should be at the
perimeter of the bushland but not in the bushland.

- The left-hand side of the map is 180 mm long. To
choose a starting point, select a random number between
0 and 180.
- Write down the selected random starting point (in
millimetres) for the left-hand side.
- Draw a horizontal line across the picture starting at
this point. Write down the length of the line lying in
undisturbed bushland -- that is, within the boundaries
you drew previously.
- Repeat Steps 1 through 3 for a random starting point
along the top of the picture. (N.B. This time you
want a starting point between 0 and 220.)
- Add the two lengths of undisturbed bushland together.
- The length of the horizontal line is 220 mm, and the
length of the vertical line is 180 mm. Use your answer
from Step 5 to estimate the proportion of the total
area in the photograph that is undisturbed bushland.
- Write your answer from Step 6 on the tutor's data
sheet.

Imagine a grid of one-millimetre squares drawn on the photograph. Each of the intersections is a point either lying in undisturbed bushland or not. Since this grid is very fine, the proportion of the points which lie in undisturbed bushland will be close to the corresponding proportion of the area. We will sample 20 points at random by selecting random coordinates from the top and side scales 20 times.

- Set up a table for your sampled points under the
headings Top, Side, and Bush.
- Generate 20 top coordinates by selecting 20 random
numbers between 0 and 220. Enter these numbers in the
Top column of your table.
- Next to these, in the Side column of your table, enter
20 random numbers in the range 0 to 180.
- Find the point on the photograph given by the Top and
Side coordinates in the first row of your table. If
the point is in undisturbed bushland, as marked by your
highlighted boundaries, enter a 1 under the Bush
heading. Enter a 0 otherwise.
Suppose your first coordinate pair is Top = 25 mm, Side = 150 mm, and the point lies in farmland. Then your table would look like this:

Top Side Bush 25 150 0

- Repeat Step 4 for all 20 points.
- Count the number of 1's in the Bush column and divide
the count by 20 to estimate the proportion of
undisturbed bushland in the region.
- Write your answer on the tutor's data sheet.

Your tutor will write the class results for the two methods on the board. Get into groups and perform the following tasks.

- Find the five number summary of estimates for Method 1
in your class.
- Find the five number summary of estimates for Method 2
in your class.
- Use the five number summaries to decide which sampling
method is more precise (i.e., less variable). Give
reasons. Consider ways of improving the precision.
- Decide whether either method is precise enough to be
useful.
- Discuss whether it is reasonable to compare a proportion estimated from a random sample of 20 points with one estimated from two lines with a total of 400 points. Give reasons. (Hint: You could consider the amount of effort required to collect each of these samples in the laboratory and in the field.)

Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. (1993), Distance Sampling: Estimating Abundance of Biological Populations, London: Chapman & Hall.

Hanley, J. A., and Shapiro, S. H. (1994), "Sexual Activity and the Lifespan of Male Fruitflies: A Dataset That Gets Attention," Journal of Statistics Education [Online], 2(1). (http://ww2.amstat.org/publications/jse/v2n1/datasets.hanley.html)

Rossman, A. J., and Short, T. H. (1995), "Conditional Probability and Educational Reform: Are They Compatible?," Journal of Statistics Education [Online], 3(2). (http://ww2.amstat.org/publications/jse/v3n2/rossman.html)

Glenys Bishop

Statistics Department

University of Adelaide

Australia 5005

Return to Table of Contents | Return to the JSE Home Page