Journal of Statistics Education v.4, n.3 (1996)
Joan B. Garfield
Department of Educational Psychology
University of Minnesota
332 Burton Hall
Minneapolis, MN 55455
William P. Peterson
Department of Mathematics and Computer Science
Middlebury, VT 05753-6145
J. Laurie Snell
Department of Mathematics and Computing
Hanover, NH 03755-1890
This column features "bits" of information sampled from a variety of sources that may be of interest to teachers of statistics. Joan abstracts information from the literature on teaching and learning statistics, while Bill and Laurie summarize articles from the news and other media that may be used with students to provoke discussions or serve as a basis for classroom activities or student projects. We realize that due to limitations in the literature we have access to and time to review, we may overlook some potential articles for this column, and therefore encourage you to send us your reviews and suggestions for abstracts.
ed. Alan Bishop (1996). Dordrecht, The Netherlands: Kluwer Academic Publishers.
This 1200-page handbook is the fourth in a series of International Handbooks of Education. It is divided into four major sections, one of which is Curriculum, Goals, Contents, and Resources, edited by Jeremy Kilpatrick. Two of the chapters in this section are "Probability" and "Data Handling." Each chapter is very current and comprehensive and offers a long reference list. The chapter on probability, written by M. Borovcnik and R. Peard, examines issues related to probabilistic thinking. Problems associated with the understanding of probability are addressed. Current approaches to positioning probability within the curriculum of data analysis and statistical inference are analyzed. Cultural factors in the development and treatment of the subject are also addressed. The authors include strategies to improve teaching of probability. The chapter on data handling, written by J. M. Shaughnessy, J. Garfield, and Brian Greer, describes the historical roots of the current data handling (or data analysis) emphasis in teaching statistics, points out some of the national reform efforts that have catalyzed an interest in data handling, and discusses various data handling curricula. Special attention is given to the use of technology in teaching data handling, to the importance of professional development for teachers of data handling, and to some issues for research in the teaching and learning of data handling.
by Iddo Gal and Jonathan Baron (1996). Thinking and Reasoning, 2(1), 61-98.
This article reports the results of a study that examined college and high school students' reasoning regarding random experiments with dice and balls in an urn. Students were asked to bet on two events that had different probabilities and to generate or evaluate a strategy for betting on repetitions of the experiment. Large numbers of both high school and college students demonstrated misunderstandings of the probabilities involved in the experiments. Although some students seemed to understand the concept of independence, they failed to use it when generating or evaluating betting strategies. The authors conclude that the teaching of probabilistic reasoning should include opportunities for students to engage in concrete or software-assisted activities that will lead them to confront their misconceptions.
by Gail Burrill (1996). Mathematics Teacher, 89(6), 460-465, 540.
Three activities from the new Data Driven Mathematics Curriculum Project are introduced. One involves estimating ages of a sample of famous people and determining an appropriate statistical method to use in determining the quality of a set of estimates. Other activities connect statistics to topics in geometry and algebra.
by William Hadley (1996). Mathematics Teacher, 89(7), 562-569.
Two experiments are introduced as ways to collect and analyze data during a class. One involves the Stroop Test, where students are given lists of words (actually names of colors) written in different colored ink, and are asked to read the lists of words out loud. Lengths of time needed to read different lists are recorded and analyzed. The second activity involves constructing a chain of people holding hands. Records are kept for how long it takes to pass a hand squeeze down chains consisting of different numbers of people. The analysis of data for both activities involves constructing linear equations to make predictions.
The November issue of Mathematics Teacher (Volume 89, Number 8) has four articles that will interest statistics educators.
by M. Haruta, M. Flaherty, J. McGivney, and R. McGivney, pp. 642-645.
This article describes a class activity based on solving the following problem:
To raise funds for your class, someone suggests the following game: The cafeteria floor consists of 9" x 9" tiles. Players toss a circular disc onto the floor. If the disk comes to rest on the edge of any tile, the player loses $1. Otherwise, the player wins $1. Your job is to determine the size of the disc needed so that the probability of the player's winning is .45.Questions are provided to stimulate class discussion of the mathematical ideas involved, and a simulated solution is offered.
by F. Curcio and A. Artzt, pp. 668-673.
This article suggests four strategies for teachers to use in designing assessments involving data analysis. An argument is presented for using rich data-related assessment tasks, which should improve students' ability to interpret, analyze, and extrapolate from graphs. Examples are given of items that can be used to help assess higher-order thinking about data. The authors suggest six questions to use in deciding whether a problem has potential as a data-related assessment task.
by C. Richbart and L. Richbart, pp. 674-677.
An activity is described to help students explore the biases that affect their choices when they make decisions involving uncertainty. Building on research by Kahneman and Tversky, the authors encourage teachers to recognize that theoretical situations and real-world applications may result in different choices. They feel our goal as teachers should be to help students make informed decisions by making them aware of psychological factors that affect their choices.
by C. Embse and A. Engebretsen, pp. 688-692.
Using nutritional information for a sample of candy bars as a data set, the authors describe an activity to help students visualize the mean and standard deviation using a graphing calculator.
A regular component of the Teaching Bits Department is a list of articles from Teaching Statistics, an international journal based in England. Brief summaries of the articles are included. In addition to these articles, Teaching Statistics features several regular departments that may be of interest, including Computing Corner, Curriculum Matters, Data Bank, Historical Perspective, Practical Activities, Problem Page, Project Parade, Research Report, Book Reviews, and News and Notes.
The Circulation Manager of Teaching Statistics is Peter Holmes, firstname.lastname@example.org, RSS Centre for Statistical Education, University of Nottingham, Nottingham NG7 2RD, England.
"Raising Statistical Awareness" by Sharleen Forbes
This article describes the organisation and results of the 1990 New Zealand Children's Census. This census, which preceded a National Census, provided a powerful tool for raising statistical awareness nationwide.
"Bivariate Data: Lessons from Students' Coursework" by Roger Porkess
This article examines some of the difficulties frequently encountered by students when analysing bivariate data and suggests how they might be overcome.
"An Autumnal Investigation" by Mary Rouncefield
This article describes two investigations which arose out of children observing the natural phenomena around them, asking questions about those phenomena and devising their own hypotheses to test.
In addition to the articles listed above, this issue of Teaching Statistics also contains the regular columns Classroom Notes, Computing Corner, Practical Activities, Net Benefits: Data for Statistics Teaching, Historical Perspective, Standard Errors, Apparatus Reviews, and a review of the video series "Statistics: Decisions Through Data."
by Steve H. Hanke. The Wall Street Journal, 23 September 1996, A20.
The United States incarceration rate nearly tripled between 1973 and 1994, yet the number of reported violent crimes per capita approximately doubled, and the rate of reported property crime rose 30%. Some observers have interpreted this as evidence that incarceration is not working. But economist Steven D. Levitt, writing in the Quarterly Journal of Economics (May 1996), argues that without the increase in incarceration, violent crime would have been approximately 70% higher, and property crime almost 50% higher. The real problem, according to Levitt, is that not enough criminals are locked up. A graph provided in the present article indicates that increasing the prison population reduces all major categories of violent and non-violent crime. The author calculates from these data that on average about 15 crimes per year are eliminated for each additional prisoner.
Having thus noted that incarceration works to prevent crimes, the author then turns to the question of whether it is cost-effective. He quotes results from Levitt estimating that the average annual cost of incarceration is $30,000 a year, while the annual amount of damage the average criminal would do on the outside is $53,000. This represents a net gain to society of $23,000 from locking a criminal up.
The reporting here raises a host of questions. What is the difference between increasing the "prison population" and increasing the "incarceration rate"? How did Mr. Levitt go about estimating the number of potential crimes that will be avoided, or their monetary values? What hidden variables might weaken the case for causal links?
by Bob Davis. The Wall Street Journal, 11 October 1996, A1.
Since 1990, several thousand low-income Milwaukee families have received state-funded vouchers to allow them to take their children out of public schools and enroll them in private schools. The program has been watched closely as a model program designed to give poor children some of the advantages of children of wealthier families.
John Witte of the University of Wisconsin was selected by the state to track the progress of the program. In a series of annual reports, he compared the progress of the voucher students to a control group chosen from the general Milwaukee school population. He found that voucher students did not advance faster than the control group, despite the fact that the parents of the children felt that the private school atmosphere was much better for their children.
Harvard political scientist Paul Peterson was critical of comparing the progress of the voucher students to randomly chosen Milwaukee students. He carried out his own study by taking advantage of the fact that the four private schools, faced with more applicants than they had seats, had used a lottery to decide whom to accept. Peterson compared the performance of those accepted and those not accepted and found that, while their performance in the first year was no better, it was significantly better on standardized tests after three years.
Of course, the issue has become highly political. In fact, it came up in this fall's presidential debates. Bob Dole supported the voucher plan, promising a $3 billion-a-year federal program to pay for scholarships to send low- and middle-income children to private schools. Bill Clinton, while not opposing local voucher experiments, said that the "highly ambiguous" results in Milwaukee did not justify a federal voucher program.
Peterson and Witte have engaged in an extended and acrimonious debate over the statistical issues involved. Peterson points out that the lottery had the effect of creating randomized "treatment" and "control" groups, and insists that this gives the best basis for comparison. Witte contends that the methodology of controlled medical experiments is inappropriate for modeling educational achievement. You can find the data, the studies, and Peterson and Witte's critiques of each other's work on the web page of the American Federation of Teachers. (http://www.aft.org/pr/gp_page.htm)
Officials investigating last summer's crash of TWA 800 still have not ruled out a mechanical failure, a bomb explosion on-board, or a missile attack as potential causes. The continuing mystery has give rise to popular speculation that the plane was hit by a meteor. The following sequence of letters to the editor demonstrates the subtleties in reasoning about independent trials, coincidences, and comparisons of rates involving rare events.
Letter to the editor by Charles Hailey and David Helfand. The New York Times, 19 September 1996, A26.
The writers refer to an earlier article about the TWA Flight 800 crash in which it is reported that "more than once, senior crash investigators have tried to end the speculation by ranking the possibility of friendly fire at about the same level as that a meteorite destroyed the jet." They feel that this must be based on a misconception of the probability that a meteorite would destroy a jet and write:
The odds of a meteor striking TWA Flight 800 or any other single airline flight are indeed small. However, the relevant calculation is not the likelihood of any particular aircraft being hit, but the probability that one commercial airliner over the last 30 years of high-volume air travel would be struck by an incoming meteor with sufficient energy to cripple the plane or cause an explosion.
Approximately 3,000 meteors a day with the requisite mass strike Earth. There are 50,000 commercial airline takeoffs a day worldwide. Adopting an average flight time of two hours, this translates to more than 3,500 planes in the air; these cover approximately two-billionths of Earth's surface.
Multiplying this by the number of meteors per day and the length of the era of modern air travel leads to a 1-in-10 chance that a commercial flight would have been knocked from the sky by meteoric impact.
Letter to the editor by Guy Maxtone-Graham. The New York Times, 24 September 1996, A24.
As any statistician can tell you, the outcome of past, random events has no bearing on future, unrelated random events. Toss a coin 10 times and the odds of getting heads or tails on the 11th toss are still 50-50.
Likewise, calculations based on the number of flights worldwide, the number of takeoffs per day and the number of years that commercial flights have thrived have no bearing on the question of whether a rock from outer space happened to enter the atmosphere to hit one particular airliner on July 17. The odds of such a freak accident downing a specific flight remain small, and the professors' conclusion that "the meteor impact theory deserves more considered attention" is difficult to support.
Letter to the editor by Bill Grassman. The New York Times, 28 September 1996, Sec. 1, p. 22.
Attempts to prove or disprove the probability that TWA Flight 800 was the victim of a meteor recall the tale of the business executive who, concerned that he might be on a plane with a bomb, commissioned a study to determine the odds of that happening.
When the calculations of flights per day, when and where the bombings had occurred and the normal flying patterns of the executive disclosed that the odds of his being on a plane with a bomb were 1 in 13 million, he asked for the probability of his being on a plane with two bombs. On learning that this increased the odds to 1 in 42 billion, he always carried a bomb with him. Statistics!
by Carol Tavris. The New York Times, 17 September 1996, A23.
Ms. Tavris is the author of the recent book "The Mismeasure of Woman." In this article, she lists several examples of supposed gender gaps which disappear when a relevant variable is controlled for. Two quick examples:
During this fall's presidential campaign, there was much media commentary on the gender gap. A New York Times/CBS News poll just before the time of this article showed women preferring Clinton to Dole by 61% to 33%. Conservatives explain the gap by saying that women tend to be more sentimental, more risk-averse and less competitive than men; liberals claim that women are more compassionate and less aggressive than men, and thus attracted to the party that will help the weakest members of society.
Tavris rejects both of these explanations, pointing out that neither explains why women who voted for Nixon and Reagan have abandoned Dole. As for sentimentality or compassion, she bluntly states that affluent women have not historically shown much sympathy for women in poverty. She suggests instead that the gender gap in the political situation is largely an experience gap. A woman whose husband left her may have been saved by welfare. More women than men are taking care of aging, infirm parents. More single mothers than single fathers are taking care of children on their own. Tavris concludes: "For women to perceive the Democrats more responsive than the Republicans to these concerns is neither sentimental nor irrational. It stems from self-interest."
by Steven J. Milloy. The Wall Street Journal, 8 August 1996, A10.
In this op-ed piece, Milloy claims that the Environmental Protection Agency (EPA) is "about to escape from the shackles of good science" by abandoning the requirement of statistical significance in epidemiological studies used to designate environmental factors (electromagnetic fields, dioxin, second-hand smoke) as cancer risks.
Milloy's case is not entirely clear here (there is further discussion on his home-page http://www.junkscience.com under "What's Hot"). Indeed, because he complained about the EPA switching from the more traditional 95% confidence to 90% in the case of second-hand smoke, a reader in a letter to the editor to The Wall Street Journal took Milloy to task for not realizing that the level of significance could reasonably vary from situation to situation. It seems that Malloy is convinced that the EPA's 1986 regulations explicitly REQUIRED statistical significance, while their new proposals implicitly do not. The EPA disagrees with this interpretation. Sorting out the differences makes an interesting topic for discussion.
According to the 1986 guidelines, three criteria must be met before a causal association can be inferred between exposure and cancer in humans:
The 1996 guidelines propose that:
A causal interpretation is enhanced for studies to the extent that they meet the criteria described below. None of the criteria is conclusive by itself, and the only criterion that is essential is the temporal relationship.
by Roger Highfield. Daily Telegraph, 29 August 1996, p. 3.
Mr. Highfield asserts that even if the Meteorology Office forecasts a downpour, you should not bother to take your umbrella. This despite the fact that the Meteorology Office claims that short-range forecasts are now more than 80% accurate. Matthews says that: "The accuracy figures are misleading because they're heavily biased by success in predicting the absence of rain. The real probability of rain during a one-hour walk, following a Meteorology Office forecast (of rain), is only 30%."
Highfield's discussion is based on an article by Robert A. J. Matthews in the journal Nature (29 August 1996, p. 766). Using current data from the weather service, Matthews carries out a decision analysis by assigning costs to the possible scenarios: taking or not taking an umbrella, rain forecast or rain not forecast, it rains or does not rain. He shows that, unless you attach an exceptionally heavy loss to getting wet, the optimal strategy is to simply ignore the weather prediction and not take an umbrella on your walk.
The Nature article presents following 2 x 2 table for forecast and actual weather over 1000 one-hour walks in London.
Rain No Rain Total Forecast: rain 66 156 222 Forecast: no rain 14 764 778 Total 80 920 1000
In responding to Matthews' conclusion, the Meteorology Office chose its words carefully: "He (Matthews) may well be right if you are looking at a showery situation. We would maintain that, if we forecast a day of sunshine and showers, and showers occur, the forecast is correct. But we can't forecast whether rain falls on the high street or not."
Letter to the editor by Donald M. Stewart. The Washington Post, 14 September 1996, A24.
Mr. Stewart is the president of The College Board. Given here verbatim is his explanation for the decision to "recenter" the SATs. His statement provides a nice context for discussion measurement validity and reliability.
The Scholastic Assessment Test (SAT) is good at detecting changes in students' academic preparation for college, but that is not why students take it or why colleges use the scores ["Are Test Scores Improving?", editorial, August 31]. The test's major value is its ability to predict the success of individual students in the first year or two of college. Its primary assets are its predictive validity and reliability, which help colleges be objective and fair as they sort through various, more subjective admissions criteria.
We decided to recenter the SAT score scale because our first obligation is to score and scale the SAT so that it will most fairly and accurately predict students' prospects in college. Recentering does this by distributing scores to reflect the composition of the million-plus college-bound seniors who take the SAT today, not the 10,654 who took it in 1941 -- mostly men (62 percent) and many from independent schools (41 percent). Yet some would index today's students' scores to that small and unrepresentative group of students who took the SAT prior to World War II. In 1996, 1,084,725 students took the test; 53 percent were women, 30 percent minorities and 83 percent from public high schools.
Anyone concerned about score trends should know that all trends remain clear after recentering because concordance tables distributed to schools and colleges make it easy to translate old scores into recentered scores for individuals and groups and to track average scores over time.
On the College Board web page (http://www.collegeboard.org/sat/html/admissions/stat000.html) one can find the following table, reproduced from a study done by the Educational Testing Service (ETS) to evaluate the effect of recentering on the validity of the SAT in predicting the freshman grade point average. ("Effects of Scale Choice on Predictive Validity" by R. Morgan, ETS, 1994.)
This table gives the correlation of the SAT exams and High School (HS) grade-point averages with college freshman grade-point averages. The correlations are the average correlations for 75 colleges and universities using the original scale (O) and then the recentered scale (R).
Total Male Female (O) (R) (O) (R) (O) (R) SAT Verbal .42 .43 .40 .40 .45 .46 SAT Math .46 .46 .44 .44 .48 .49 SAT Total .50 .51 .49 .49 .53 .54 HS GPA .48 .48 .47 .47 .49 .49 SAT Plus HS GPA .59 .59 .57 .58 .61 .62 SAT Increment .10 .11 .11 .11 .12 .13
Here is the concordance table that Stewart described for translating old scores into recentered scores.
Old New Verbal Math 800 800 800 790 800 800 780 800 800 770 800 790 760 800 770 750 800 760 740 800 740 730 800 730 720 790 720 710 780 700 700 760 690 690 750 680 680 740 670 670 730 660 660 720 650 650 710 650 640 700 640 630 690 630 620 680 620 610 670 610 600 670 600 590 660 600 580 650 590 570 640 580 560 630 570 550 620 560 540 610 560 530 600 550 520 600 540 510 590 530 500 580 520 490 570 520 480 560 510 470 550 500 460 540 490 450 530 480 440 520 480 430 510 470 420 500 460 410 490 450 400 480 440 390 470 430 380 460 430 370 450 420 360 440 410 350 430 400 340 420 390 330 410 380 320 400 370 310 390 350 300 380 340 290 370 330 280 360 310 270 350 300 260 340 280 250 330 260 240 310 240 230 300 220 220 290 200 210 270 200 200 230 200