Teaching Bits: A Resource for Teachers of Statistics

Journal of Statistics Education v.4, n.3 (1996)

Joan B. Garfield
Department of Educational Psychology
University of Minnesota
332 Burton Hall
Minneapolis, MN 55455
612-625-0337

jbg@maroon.tc.umn.edu

William P. Peterson
Department of Mathematics and Computer Science
Middlebury College
Middlebury, VT 05753-6145
802-443-5417

wpeterson@middlebury.edu

J. Laurie Snell
Department of Mathematics and Computing
Dartmouth College
Hanover, NH 03755-1890
603-646-2951

jlsnell@dartmouth.edu

This column features "bits" of information sampled from a variety of sources that may be of interest to teachers of statistics. Joan abstracts information from the literature on teaching and learning statistics, while Bill and Laurie summarize articles from the news and other media that may be used with students to provoke discussions or serve as a basis for classroom activities or student projects. We realize that due to limitations in the literature we have access to and time to review, we may overlook some potential articles for this column, and therefore encourage you to send us your reviews and suggestions for abstracts.

From the Literature on Teaching and Learning Statistics

International Handbook of Mathematics Education

ed. Alan Bishop (1996). Dordrecht, The Netherlands: Kluwer Academic Publishers.

This 1200-page handbook is the fourth in a series of International Handbooks of Education. It is divided into four major sections, one of which is Curriculum, Goals, Contents, and Resources, edited by Jeremy Kilpatrick. Two of the chapters in this section are "Probability" and "Data Handling." Each chapter is very current and comprehensive and offers a long reference list. The chapter on probability, written by M. Borovcnik and R. Peard, examines issues related to probabilistic thinking. Problems associated with the understanding of probability are addressed. Current approaches to positioning probability within the curriculum of data analysis and statistical inference are analyzed. Cultural factors in the development and treatment of the subject are also addressed. The authors include strategies to improve teaching of probability. The chapter on data handling, written by J. M. Shaughnessy, J. Garfield, and Brian Greer, describes the historical roots of the current data handling (or data analysis) emphasis in teaching statistics, points out some of the national reform efforts that have catalyzed an interest in data handling, and discusses various data handling curricula. Special attention is given to the use of technology in teaching data handling, to the importance of professional development for teachers of data handling, and to some issues for research in the teaching and learning of data handling.

"Understanding Repeated Simple Choices"

by Iddo Gal and Jonathan Baron (1996). Thinking and Reasoning, 2(1), 61-98.

This article reports the results of a study that examined college and high school students' reasoning regarding random experiments with dice and balls in an urn. Students were asked to bet on two events that had different probabilities and to generate or evaluate a strategy for betting on repetitions of the experiment. Large numbers of both high school and college students demonstrated misunderstandings of the probabilities involved in the experiments. Although some students seemed to understand the concept of independence, they failed to use it when generating or evaluating betting strategies. The authors conclude that the teaching of probabilistic reasoning should include opportunities for students to engage in concrete or software-assisted activities that will lead them to confront their misconceptions.

"Data Driven Mathematics: A Curriculum Strand for High School Mathematics"

by Gail Burrill (1996). Mathematics Teacher, 89(6), 460-465, 540.

Three activities from the new Data Driven Mathematics Curriculum Project are introduced. One involves estimating ages of a sample of famous people and determining an appropriate statistical method to use in determining the quality of a set of estimates. Other activities connect statistics to topics in geometry and algebra.

"Experiments from Psychology and Neurology"

by William Hadley (1996). Mathematics Teacher, 89(7), 562-569.

Two experiments are introduced as ways to collect and analyze data during a class. One involves the Stroop Test, where students are given lists of words (actually names of colors) written in different colored ink, and are asked to read the lists of words out loud. Lengths of time needed to read different lists are recorded and analyzed. The second activity involves constructing a chain of people holding hands. Records are kept for how long it takes to pass a hand squeeze down chains consisting of different numbers of people. The analysis of data for both activities involves constructing linear equations to make predictions.

The November issue of Mathematics Teacher (Volume 89, Number 8) has four articles that will interest statistics educators.

"Coin Tossing"

by M. Haruta, M. Flaherty, J. McGivney, and R. McGivney, pp. 642-645.

This article describes a class activity based on solving the following problem:

To raise funds for your class, someone suggests the following game: The cafeteria floor consists of 9" x 9" tiles. Players toss a circular disc onto the floor. If the disk comes to rest on the edge of any tile, the player loses $1. Otherwise, the player wins $1. Your job is to determine the size of the disc needed so that the probability of the player's winning is .45.

Questions are provided to stimulate class discussion of the mathematical ideas involved, and a simulated solution is offered.

"Assessing Students' Ability to Analyze Data: Reaching Beyond Computation"

by F. Curcio and A. Artzt, pp. 668-673.

This article suggests four strategies for teachers to use in designing assessments involving data analysis. An argument is presented for using rich data-related assessment tasks, which should improve students' ability to interpret, analyze, and extrapolate from graphs. Examples are given of items that can be used to help assess higher-order thinking about data. The authors suggest six questions to use in deciding whether a problem has potential as a data-related assessment task.

"A Bird in the Hand"

by C. Richbart and L. Richbart, pp. 674-677.

An activity is described to help students explore the biases that affect their choices when they make decisions involving uncertainty. Building on research by Kahneman and Tversky, the authors encourage teachers to recognize that theoretical situations and real-world applications may result in different choices. They feel our goal as teachers should be to help students make informed decisions by making them aware of psychological factors that affect their choices.

"Visual Representations of Mean and Standard Deviation"

by C. Embse and A. Engebretsen, pp. 688-692.

Using nutritional information for a sample of candy bars as a data set, the authors describe an activity to help students visualize the mean and standard deviation using a graphing calculator.

Teaching Statistics

A regular component of the Teaching Bits Department is a list of articles from Teaching Statistics, an international journal based in England. Brief summaries of the articles are included. In addition to these articles, Teaching Statistics features several regular departments that may be of interest, including Computing Corner, Curriculum Matters, Data Bank, Historical Perspective, Practical Activities, Problem Page, Project Parade, Research Report, Book Reviews, and News and Notes.

The Circulation Manager of Teaching Statistics is Peter Holmes, ph@maths.nott.ac.uk, RSS Centre for Statistical Education, University of Nottingham, Nottingham NG7 2RD, England.

Teaching Statistics, Autumn 1996
Volume 18, Number 3

"Raising Statistical Awareness" by Sharleen Forbes

This article describes the organisation and results of the 1990 New Zealand Children's Census. This census, which preceded a National Census, provided a powerful tool for raising statistical awareness nationwide.

"Bivariate Data: Lessons from Students' Coursework" by Roger Porkess

This article examines some of the difficulties frequently encountered by students when analysing bivariate data and suggests how they might be overcome.

"An Autumnal Investigation" by Mary Rouncefield

This article describes two investigations which arose out of children observing the natural phenomena around them, asking questions about those phenomena and devising their own hypotheses to test.

In addition to the articles listed above, this issue of Teaching Statistics also contains the regular columns Classroom Notes, Computing Corner, Practical Activities, Net Benefits: Data for Statistics Teaching, Historical Perspective, Standard Errors, Apparatus Reviews, and a review of the video series "Statistics: Decisions Through Data."

Topics for Discussion from Current Newspapers and Journals

"Incarceration is a Bargain"

by Steve H. Hanke. The Wall Street Journal, 23 September 1996, A20.

The United States incarceration rate nearly tripled between 1973 and 1994, yet the number of reported violent crimes per capita approximately doubled, and the rate of reported property crime rose 30%. Some observers have interpreted this as evidence that incarceration is not working. But economist Steven D. Levitt, writing in the Quarterly Journal of Economics (May 1996), argues that without the increase in incarceration, violent crime would have been approximately 70% higher, and property crime almost 50% higher. The real problem, according to Levitt, is that not enough criminals are locked up. A graph provided in the present article indicates that increasing the prison population reduces all major categories of violent and non-violent crime. The author calculates from these data that on average about 15 crimes per year are eliminated for each additional prisoner.

Having thus noted that incarceration works to prevent crimes, the author then turns to the question of whether it is cost-effective. He quotes results from Levitt estimating that the average annual cost of incarceration is $30,000 a year, while the annual amount of damage the average criminal would do on the outside is $53,000. This represents a net gain to society of $23,000 from locking a criminal up.

The reporting here raises a host of questions. What is the difference between increasing the "prison population" and increasing the "incarceration rate"? How did Mr. Levitt go about estimating the number of potential crimes that will be avoided, or their monetary values? What hidden variables might weaken the case for causal links?

"Class Warfare: Dueling Professors Have Milwaukee Dazed Over School Vouchers"

by Bob Davis. The Wall Street Journal, 11 October 1996, A1.

Since 1990, several thousand low-income Milwaukee families have received state-funded vouchers to allow them to take their children out of public schools and enroll them in private schools. The program has been watched closely as a model program designed to give poor children some of the advantages of children of wealthier families.

John Witte of the University of Wisconsin was selected by the state to track the progress of the program. In a series of annual reports, he compared the progress of the voucher students to a control group chosen from the general Milwaukee school population. He found that voucher students did not advance faster than the control group, despite the fact that the parents of the children felt that the private school atmosphere was much better for their children.

Harvard political scientist Paul Peterson was critical of comparing the progress of the voucher students to randomly chosen Milwaukee students. He carried out his own study by taking advantage of the fact that the four private schools, faced with more applicants than they had seats, had used a lottery to decide whom to accept. Peterson compared the performance of those accepted and those not accepted and found that, while their performance in the first year was no better, it was significantly better on standardized tests after three years.

Of course, the issue has become highly political. In fact, it came up in this fall's presidential debates. Bob Dole supported the voucher plan, promising a $3 billion-a-year federal program to pay for scholarships to send low- and middle-income children to private schools. Bill Clinton, while not opposing local voucher experiments, said that the "highly ambiguous" results in Milwaukee did not justify a federal voucher program.

Peterson and Witte have engaged in an extended and acrimonious debate over the statistical issues involved. Peterson points out that the lottery had the effect of creating randomized "treatment" and "control" groups, and insists that this gives the best basis for comparison. Witte contends that the methodology of controlled medical experiments is inappropriate for modeling educational achievement. You can find the data, the studies, and Peterson and Witte's critiques of each other's work on the web page of the American Federation of Teachers. (http://www.aft.org/pr/gp_page.htm)

TWA Flight 800

Officials investigating last summer's crash of TWA 800 still have not ruled out a mechanical failure, a bomb explosion on-board, or a missile attack as potential causes. The continuing mystery has give rise to popular speculation that the plane was hit by a meteor. The following sequence of letters to the editor demonstrates the subtleties in reasoning about independent trials, coincidences, and comparisons of rates involving rare events.

"TWA Flight 800 Crash, Don't Discount Meteor"

Letter to the editor by Charles Hailey and David Helfand. The New York Times, 19 September 1996, A26.

The writers refer to an earlier article about the TWA Flight 800 crash in which it is reported that "more than once, senior crash investigators have tried to end the speculation by ranking the possibility of friendly fire at about the same level as that a meteorite destroyed the jet." They feel that this must be based on a misconception of the probability that a meteorite would destroy a jet and write:

The odds of a meteor striking TWA Flight 800 or any other single airline flight are indeed small. However, the relevant calculation is not the likelihood of any particular aircraft being hit, but the probability that one commercial airliner over the last 30 years of high-volume air travel would be struck by an incoming meteor with sufficient energy to cripple the plane or cause an explosion.
Approximately 3,000 meteors a day with the requisite mass strike Earth. There are 50,000 commercial airline takeoffs a day worldwide. Adopting an average flight time of two hours, this translates to more than 3,500 planes in the air; these cover approximately two-billionths of Earth's surface.
Multiplying this by the number of meteors per day and the length of the era of modern air travel leads to a 1-in-10 chance that a commercial flight would have been knocked from the sky by meteoric impact.

"Meteor and Plane Crash"

Letter to the editor by Guy Maxtone-Graham. The New York Times, 24 September 1996, A24.

Maxtone-Graham writes:

As any statistician can tell you, the outcome of past, random events has no bearing on future, unrelated random events. Toss a coin 10 times and the odds of getting heads or tails on the 11th toss are still 50-50.
Likewise, calculations based on the number of flights worldwide, the number of takeoffs per day and the number of years that commercial flights have thrived have no bearing on the question of whether a rock from outer space happened to enter the atmosphere to hit one particular airliner on July 17. The odds of such a freak accident downing a specific flight remain small, and the professors' conclusion that "the meteor impact theory deserves more considered attention" is difficult to support.

"Meteors and Numbers that Count"

Letter to the editor by Bill Grassman. The New York Times, 28 September 1996, Sec. 1, p. 22.

Attempts to prove or disprove the probability that TWA Flight 800 was the victim of a meteor recall the tale of the business executive who, concerned that he might be on a plane with a bomb, commissioned a study to determine the odds of that happening.
When the calculations of flights per day, when and where the bombings had occurred and the normal flying patterns of the executive disclosed that the odds of his being on a plane with a bomb were 1 in 13 million, he asked for the probability of his being on a plane with two bombs. On learning that this increased the odds to 1 in 42 billion, he always carried a bomb with him. Statistics!

"Misreading the Gender Gap"

by Carol Tavris. The New York Times, 17 September 1996, A23.

Ms. Tavris is the author of the recent book "The Mismeasure of Woman." In this article, she lists several examples of supposed gender gaps which disappear when a relevant variable is controlled for. Two quick examples:

Women are more likely to believe in horoscopes and psychics. (Not if you control for the number of math and science classes a person has had. What appears to be a gender gap is a science gap.)
Men are more likely than women to express anger directly and abusively. (Not if you control for the status of the individuals involved. What appears to be a gender gap is a power gap.)

During this fall's presidential campaign, there was much media commentary on the gender gap. A New York Times/CBS News poll just before the time of this article showed women preferring Clinton to Dole by 61% to 33%. Conservatives explain the gap by saying that women tend to be more sentimental, more risk-averse and less competitive than men; liberals claim that women are more compassionate and less aggressive than men, and thus attracted to the party that will help the weakest members of society.

Tavris rejects both of these explanations, pointing out that neither explains why women who voted for Nixon and Reagan have abandoned Dole. As for sentimentality or compassion, she bluntly states that affluent women have not historically shown much sympathy for women in poverty. She suggests instead that the gender gap in the political situation is largely an experience gap. A woman whose husband left her may have been saved by welfare. More women than men are taking care of aging, infirm parents. More single mothers than single fathers are taking care of children on their own. Tavris concludes: "For women to perceive the Democrats more responsive than the Republicans to these concerns is neither sentimental nor irrational. It stems from self-interest."

"The EPA's Houdini Act"

by Steven J. Milloy. The Wall Street Journal, 8 August 1996, A10.

In this op-ed piece, Milloy claims that the Environmental Protection Agency (EPA) is "about to escape from the shackles of good science" by abandoning the requirement of statistical significance in epidemiological studies used to designate environmental factors (electromagnetic fields, dioxin, second-hand smoke) as cancer risks.

Milloy's case is not entirely clear here (there is further discussion on his home-page http://www.junkscience.com under "What's Hot"). Indeed, because he complained about the EPA switching from the more traditional 95% confidence to 90% in the case of second-hand smoke, a reader in a letter to the editor to The Wall Street Journal took Milloy to task for not realizing that the level of significance could reasonably vary from situation to situation. It seems that Malloy is convinced that the EPA's 1986 regulations explicitly REQUIRED statistical significance, while their new proposals implicitly do not. The EPA disagrees with this interpretation. Sorting out the differences makes an interesting topic for discussion.

According to the 1986 guidelines, three criteria must be met before a causal association can be inferred between exposure and cancer in humans:

There is no identified bias that could explain the association.
The possibility of confounding has been considered and ruled out as explaining the association.
The association is unlikely to be due to chance.

The 1996 guidelines propose that:

A causal interpretation is enhanced for studies to the extent that they meet the criteria described below. None of the criteria is conclusive by itself, and the only criterion that is essential is the temporal relationship.

Temporal Relationship. The development of cancers require certain latency periods, and while latency periods vary, existence of such periods is generally acknowledged. Thus, the disease has to occur within a biologically reasonable time after initial exposure. This feature must be present if causality is to be considered.
Consistency. Associations occur in several independent studies of a similar exposure in different populations. or associations occur consistently for different subgroups in the same study. This feature usually constitutes strong evidence for a causal interpretation when the same bias or confounding is not also duplicated across studies.
Magnitude of the Association. A causal relationship is more credible when the risk estimate is large and precise (narrow confidence intervals).
Biological Gradient. The risk ratio (i.e., the ratio of the risk of disease or death among the exposed to the risk of the unexposed) increases with increasing exposure or dose. A strong dose response relationship across several categories of exposure, latency, and duration is supportive for causality given that confounding is unlikely to be correlated with exposure. The absence of a dose response relationship, however, is not by itself evidence against a causal relationship.
Specificity of the Association. The likelihood of a causal interpretation is increased if an exposure produces a specific effect (one or more tumor types also found in other studies) or if a given effect has a unique exposure.
Biological Plausibility. The association makes sense in terms of biological knowledge. Information is considered from animal toxicology, toxicokinetics, structure-activity relationship analysis, and short-term studies of the agent's influence on events in the carcinogenic process considered.
Coherence. The cause-and-effect interpretation is in logical agreement with what is known about the natural history and biology of the disease, i.e., the entire body of knowledge about the agent.

"It Never Rains But it Pours for the Meteorology Office Men"

by Roger Highfield. Daily Telegraph, 29 August 1996, p. 3.

Mr. Highfield asserts that even if the Meteorology Office forecasts a downpour, you should not bother to take your umbrella. This despite the fact that the Meteorology Office claims that short-range forecasts are now more than 80% accurate. Matthews says that: "The accuracy figures are misleading because they're heavily biased by success in predicting the absence of rain. The real probability of rain during a one-hour walk, following a Meteorology Office forecast (of rain), is only 30%."

Highfield's discussion is based on an article by Robert A. J. Matthews in the journal Nature (29 August 1996, p. 766). Using current data from the weather service, Matthews carries out a decision analysis by assigning costs to the possible scenarios: taking or not taking an umbrella, rain forecast or rain not forecast, it rains or does not rain. He shows that, unless you attach an exceptionally heavy loss to getting wet, the optimal strategy is to simply ignore the weather prediction and not take an umbrella on your walk.

The Nature article presents following 2 x 2 table for forecast and actual weather over 1000 one-hour walks in London.

                    Rain  No Rain  Total
Forecast:  rain      66     156     222
Forecast:  no rain   14     764     778
Total                80     920     1000

In responding to Matthews' conclusion, the Meteorology Office chose its words carefully: "He (Matthews) may well be right if you are looking at a showery situation. We would maintain that, if we forecast a day of sunshine and showers, and showers occur, the forecast is correct. But we can't forecast whether rain falls on the high street or not."

"Why We Centered"

Letter to the editor by Donald M. Stewart. The Washington Post, 14 September 1996, A24.

Mr. Stewart is the president of The College Board. Given here verbatim is his explanation for the decision to "recenter" the SATs. His statement provides a nice context for discussion measurement validity and reliability.

The Scholastic Assessment Test (SAT) is good at detecting changes in students' academic preparation for college, but that is not why students take it or why colleges use the scores ["Are Test Scores Improving?", editorial, August 31]. The test's major value is its ability to predict the success of individual students in the first year or two of college. Its primary assets are its predictive validity and reliability, which help colleges be objective and fair as they sort through various, more subjective admissions criteria.
We decided to recenter the SAT score scale because our first obligation is to score and scale the SAT so that it will most fairly and accurately predict students' prospects in college. Recentering does this by distributing scores to reflect the composition of the million-plus college-bound seniors who take the SAT today, not the 10,654 who took it in 1941 -- mostly men (62 percent) and many from independent schools (41 percent). Yet some would index today's students' scores to that small and unrepresentative group of students who took the SAT prior to World War II. In 1996, 1,084,725 students took the test; 53 percent were women, 30 percent minorities and 83 percent from public high schools.
Anyone concerned about score trends should know that all trends remain clear after recentering because concordance tables distributed to schools and colleges make it easy to translate old scores into recentered scores for individuals and groups and to track average scores over time.

On the College Board web page (http://www.collegeboard.org/sat/html/admissions/stat000.html) one can find the following table, reproduced from a study done by the Educational Testing Service (ETS) to evaluate the effect of recentering on the validity of the SAT in predicting the freshman grade point average. ("Effects of Scale Choice on Predictive Validity" by R. Morgan, ETS, 1994.)

This table gives the correlation of the SAT exams and High School (HS) grade-point averages with college freshman grade-point averages. The correlations are the average correlations for 75 colleges and universities using the original scale (O) and then the recentered scale (R).

                   Total     Male     Female
                 (O)  (R)  (O)  (R)  (O)  (R)
SAT Verbal       .42  .43  .40  .40  .45  .46
SAT Math         .46  .46  .44  .44  .48  .49
SAT Total        .50  .51  .49  .49  .53  .54
HS GPA           .48  .48  .47  .47  .49  .49
SAT Plus HS GPA  .59  .59  .57  .58  .61  .62
SAT Increment    .10  .11  .11  .11  .12  .13

Here is the concordance table that Stewart described for translating old scores into recentered scores.

Old       New
     Verbal  Math
800    800    800
790    800    800
780    800    800
770    800    790
760    800    770
750    800    760
740    800    740
730    800    730
720    790    720
710    780    700
700    760    690
690    750    680
680    740    670
670    730    660
660    720    650
650    710    650
640    700    640
630    690    630
620    680    620
610    670    610
600    670    600
590    660    600
580    650    590
570    640    580
560    630    570
550    620    560
540    610    560
530    600    550
520    600    540
510    590    530
500    580    520
490    570    520
480    560    510
470    550    500
460    540    490
450    530    480
440    520    480
430    510    470
420    500    460
410    490    450
400    480    440
390    470    430
380    460    430
370    450    420
360    440    410
350    430    400
340    420    390
330    410    380
320    400    370
310    390    350
300    380    340
290    370    330
280    360    310
270    350    300
260    340    280
250    330    260
240    310    240
230    300    220
220    290    200
210    270    200
200    230    200

Return to Table of Contents | Return to the JSE Home Page