Teaching Bits: A Resource for Teachers of Statistics

Topics for Discussion from Current Newspapers and Journals

William P. Peterson
Middlebury College

Journal of Statistics Education Volume 10, Number 1 (2002)

Copyright © 2002 by the American Statistical Association, all rights reserved.
This text may be freely shared among individuals, but it may not be republished in any medium without express written consent.

"Errors Mostly Tied to Ballots, not Machines"

by Dennis Cauchon, USA Today, November 7, 2001, p. 6A.

"Ballots Cast by Blacks and Older Voters were Tossed in Far Greater Numbers"

by Ford Fessenden, The New York Times, November 12, 2001, p. A17.

"Study of Disputed Florida Ballots Finds Justices Did Not Cast the Deciding Vote"

by Ford Fessenden and John M. Broder, The New York Times, November 12, 2001, p. A1.

A total of 175,010 ballots were disqualified during the 2000 election debacle in Florida. These include "overvotes," where more than one presidential choice was indicated, and "undervotes," where no vote was registered by the voting machines (in the latter category are the infamous punch card ballots with "dimpled" and "hanging" chads). Beginning last December, a consortium of eight news organizations, which included USA Today and the New York Times, undertook a painstaking examination of all the disqualified ballots. That task is now complete.

According to USA Today, the study found that ballot design played a larger role in disqualifying votes than did the voting machines themselves. Especially troublesome was the two-column format used in Palm Beach County and elsewhere, which gave rise to overvotes when people thought they needed to indicate choices for both president and vice president. While voting rights advocates have been calling for overhaul or replacement of older voting machines, the study suggests that this may not be the answer. The article states that old-fashioned punch card and modern optical scanning machines had "similar rates of error in head-to-head comparisons."

In addition to differences in ballot design and voting equipment, the study considered demographic variables such as educational level, income, age and race in order to identify other factors that may have contributed to votes being disqualified. The votes of individuals are confidential, of course. However, because the new study was conducted at the precinct level -- as compared with earlier analyses by county -- more detailed demographic inferences were possible.

Precincts with larger proportions of elderly voters had comparatively larger undervote rates, presumably because older voters had more trouble pushing hard enough on punch card ballots to remove the chads. Experts had more difficulty explaining the trends concerning race. Even after controlling for age and education, blacks were three to four times as likely as whites to have their ballots disqualified. Political Science Professor Philip Klinkner of Hamilton College is quoted as saying that this "raises the issue about whether there's some way that the voting system is set up that discriminates against blacks." A surprising response was put forward by John Lott of the American Enterprise Institute, who argued that black Republicans actually suffered the highest rate of disqualification. Lott outlined his argument in an op-ed piece for the L.A. Times ("GOP was the real victim in Fla. vote," by John R. Lott and James K. Glassman, Los Angeles Times, November 12, 2001, Part 2, p. 11). A pdf version of Lott's research paper is available from the Manhattan Institute Web site.

The second New York Times article observes that the new study still does not show definitively which candidate got the most votes. One reassuring finding is that the controversial Supreme Court ruling did not give away the election. If the Court had allowed the manual recount of 43,000 votes initiated by Gore to continue, it appears that Bush still would have prevailed. The new study considered an even larger set of disqualified votes; but, in the end, it found that the tally can go either way depending on the standards used to admit the disqualified ballots. The Times maintains a Web page devoted to the Florida vote. An interactive feature there allows users to select from a number of available criteria for accepting ballots and view the results.

More enterprising readers may wish to download the full data set from USA Today.

"Identifying a Gulf War Illness"

by Sheryl Gay Stolberg, The New York Times, December 11, 2001, p. A1.

After years of denying any link between illness and service in the Gulf War, the Defense and Veterans Affairs Departments said in a joint statement that they would immediately offer disability and survivor benefits to affected patients and families. This announcement came in response to a study conducted at a veterans hospital in Durham, NC. It was found that veterans of the Gulf War were twice as likely as other soldiers to suffer the fatal neurological disorder known as Lou Gehrig's disease.

The study identified 700,000 soldiers sent to the Gulf from August 1990 to July 1991, and compared this group with 1.8 million other soldiers who were not in the Gulf during this time period. It turned out that 40 of the Gulf War veterans and 67 of the other soldiers had developed Lou Gehrig's disease. The Times states that:

The authors did not offer theories on why Gulf War veterans would be at increased risk. Nor did they say what the odds were that the finding occurred by chance... The study has been submitted for publication in an academic journal.

An editorial in the New York Times (December 16, 2001, Section 4, p. 12) raised the following issue:

The chief question that will need to be addressed by peer reviewers is whether the study was able to ferret out cases of Lou Gehrig's disease with equal success in both groups, namely those who served in the Gulf and those who did not. With all the uproar in recent years about illnesses found among Gulf War veterans, it seems likely that the researchers identified everybody with Lou Gehrig's disease in that group. The issue will be whether the study, through advertising and contacting patients' groups and doctors, was able to find virtually all the cases in the other group as well. If they missed some, the difference between the two groups might disappear, and service in the Gulf would not be associated with risk of the disease.

"Examined Life: What Stanley H. Kaplan Taught Us About the SAT"

by Malcolm Gladwell, The New Yorker, December 17, 2001, pp. 86-92.

Test prep guru Stanley Kaplan recently published his autobiography Test Pilot: How I Broke testing Barriers for Millions of Students and Caused a Sonic Boom in the Business of Education (Simon & Schuster, 2001). Gladwell's article presents an entertaining account of Kaplan's youth and the humble beginnings of his now-famous organization. It also describes the methods Kaplan developed for "beating" standardized tests, and summarizes the reasons that led the University of California's recent proposal to discontinue using the SAT scores in admission decisions. According to Gladwell, the book's subtitle "actually understates his [Kaplan's] importance. Stanley Kaplan changed the rules of the game."

Kaplan's success was built on refuting the assertions of the ETS and the College Board that the SAT cannot be coached. These assertions were of course critical to the interpretation of SAT scores as objective measures of "innate" aptitude which could be used by admissions officials to balance the information provided by high school grade point averages. But Kaplan insisted that students could improve their scores if they understood the structure of the test and the philosophy of its designers. To illustrate, the New Yorker presents a number of examples, from both reading comprehension and mathematics, where the correct answer can be determined without reading the passage or doing any real mathematics.

Gladwell summarizes the University of California findings regarding the use of SAT I scores, SAT II and high school grade-point averages (GPA) to predict first-year college grades. Among these, SAT II scores are the best predictor, explaining 16 percent of the variance in first-year college grades. GPA was second, at 15.4 percent, and SAT I was last, at 13.3 percent. Taken together, SAT II and GPA explain 22.2% of the variance. SAT I scores explain only an additional 0.1% of the variance.

You can find the full report "UC and the SAT" at the University of California Web site. In addition to studying predictive power of the tests, the report also considers differences in performance on the tests between underrepresented groups, and discusses the potential impact on these groups of removing the SAT I from admissions decisions.

"Bamboozled by Statistics"

by Chris Giles, Financial Times, December 19, 2001, p. 17.

Darrell Huff's classic How to Lie with Statistics is still in print after four decades. Its continued relevance, says Giles, is a sad commentary about how far we have failed to come in applying statistics to issues facing business and government. The subtitle of his article poses the following question: "Methods of gathering data have improved, so why are we still misled by numbers?"

In business, Giles says that investors were only too eager to believe the astronomical valuations that analysts were giving to technology stocks. This allowed the analysts to get away with what he calls "statistical crimes."

Regarding public policy, he focuses on the following example. US studies have shown that men with low income and little education are twice as likely to suffer from strokes and lung disease and have a higher risk of dying early. But before deciding to redistribute income or increase health care spending, the government needs to know if being poor leads to worse access to health care, or if poor health leads to poverty. Giles feels that this causal puzzle can be resolved by careful analysis. He writes:

Speaking at a recent conference to promote the development, understanding and application of modern statistical methods, Nobel prize winner Daniel McFadden said that by analyzing people over time he could show that, in general, the link between low socio-economic status and death was not causal. Historic factors caused premature death among the poor, not falling down the socio-economic table.

McFadden's remarks are based on joint work with Hurd, Merrill, and Ribeiro, to appear in the Journal of Econometrics. An online version of their paper "Healthy, Wealthy, and Wise" is available in pdf format.

Giles laments the fact that most public policy debate is conducted without the benefit of such high quality research. He notes that the public tends to reject all statistical analyses, instead of learning to think critically about data and to distinguish good analyses from bad. Perhaps this is a result seeing statistics routinely abused by advocates on both sides of every issue. But Giles also criticizes the statistical profession for failing to explain new developments in practical terms, leaving the public unaware that any progress is being made on important issues.

"Heads I Win, Tails You Lose"

by Roger Boyes, Tom Baldwin, and Nigel Hawkes, The Times (London), January 4, 2002.

Are the new euro coins fair? The diameter and weight of all one-euro coins are the same, but each country has its own symbol on one side. To check the Belgian euro, students in a statistics class spun the coin on a table 250 times and got 140 heads (specifically, King Albert's). The authors of the present article conducted what they describe as "an unscientific but mind-numbingly thorough test" using the German euro. Out of 100 spins, 54 came up heads; among 100 tosses, 60 were heads. Statistics students may want to check if any of this gives convincing evidence that the coins are unfair.

A popular activity for statistics classes is to spin US pennies and observe that heads comes up less often than tails. In his book A Mathematician Reads the Newspaper (Basic Books, 1995), John Allen Paulos states that spinning coins results in only about 30 percent heads. Another activity, described in Activity-Based Statistics (Springer-Verlag, 1996), involves standing pennies on edge on a table and then banging the table so all the pennies fall over. This will often result in 90% or more heads. These results reflect aspects of asymmetry in the design on the coin.

On the other hand, tossing a coin yields heads about half the time. The article explains that this is true even for asymmetric coins, since too many other variables affect the outcome of a toss. There are a number of famous historical accounts of coin tossing, each more mind-numbingly thorough than experiment described above. The French naturalist Count Buffon (1707-1788), of Buffon's needle fame, tossed a coin 4040 times with heads coming on 2048 tosses, or 50.693 percent of the time. Statistician Karl Pearson tossed a coin 24,000 times, obtaining 12,012 heads for 50.05 percent. While imprisoned by the Germans during WWII, South African mathematician John Kerrich tossed a coin 10,000 times with heads coming up 5067 or 50.67 percent of the time. Kerrich's data appear in Statistics by Freedman, Pisani and Purves (3rd edition, W. W. Norton, 1997).

"Judge Rules that Fingerprints Don't Prove a Match"

by Andy Newman, The New York Times, January 11, 2002, p. A14.

We've all seen television courtroom dramas in which the fingerprint expert states that the defendant's fingerprints match prints found at the crime scene. But how reliable is such testimony?

Not reliable enough, according to a recent ruling by federal judge Louis Pollak of Philadelphia, who found that fingerprinting does not meet the standards for scientific evidence set by the US Supreme Court in early 1990s. The Court said that expert witnesses cannot state opinions based on a scientific technique unless that technique has been tested and has a known error rate. Critics of fingerprint evidence have argued that it has never been subjected to such testing, while some fingerprint experts have maintained the unrealistic position that the error rate is in fact zero. Under Judge Pollak's new ruling, fingerprint experts can discuss "points of similarity" between a defendant's prints and those found at a crime scene, but can not testify that the prints match.

Although Judge Pollak's ruling has immediate force only in his court, it is sure to encourage further challenges to fingerprint evidence. Prosecutors worry that they will face similar challenges to other identification techniques, such as hair analysis, handwriting analysis and ballistics matching. On the other hand, DNA fingerprinting has so far been found to meet the Supreme Court standard, because the analysis gives the "statistical probability of a match".

"That Old Black Magic"

by Alexander Wolf, Sports Illustrated, January 21, 2002, pp. 50-61.

This cover story is about ... well, the cover of Sports Illustrated! For years, superstitious readers have believed that athletes are jinxed by appearing on the magazine's cover. Shortly after being so honored, their performance mysteriously declines. The article traces this phenomenon to the magazine's very first issue in August 1954. The cover featured Eddie Mathews of baseball's Milwaukee Braves. The team's nine game winning streak ended the day after the magazine came out. A week later Mathews was hit by a pitch and knocked out of the lineup for seven games. The article presents a parade of covers from the decades since, with grim tales of the fates that followed. Basketball legend Larry Bird was reportedly thrice jinxed: first as a college player at Indiana State, then as a Boston Celtics player, and finally as coach of the Indiana Pacers. You can view the full "jinx timeline" and other material related to the article at the magazine's Web site.

Searching for statistical confirmation of the jinx, the editors reviewed all 2456 covers from the magazine's history. In 913 of these cases, or 37.2%, the featured athletes suffered "measurable and fairly immediate" negative consequences (for example, in football, immediate meant the next week's game; in baseball, a slump starting within the next two weeks was counted).

Various attempts to debunk the jinx theory are presented. The five athletes who have most frequently appeared on the cover are Michael Jordan, Mohammed Ali, Kareem Abdul-Jabbar, Magic Johnson and Jack Nicklaus. Needless to say, this is not a cast of hard-luck cases. For others, sports psychologists point out that the "jinx" might be attributable to the changed expectations athletes face as a result of being featured. Athletes may be distracted by the extra attention, or even change their own approach to the game, with predictably counterproductive effects.

To their credit, the editors recognized that their 37.2% figure needed to be compared to something. To find out what, they contacted Professor Gary Smith of Pomona College. Smith explained the need to establish baseline statistics for the athletes, by considering their career performances as well as their frequency of injury. But that sounded like a lot of work for the magazine to take on. In any event, the writers seem to enjoy the mystique of the jinx. Toward the end, the article quotes astronomer Carl Sagan, who had the following to say about matters ranging from the miracles at Lourdes to the phenomenon of streak shooting: "What's the harm of a little mystification. It sure beats boring statistical analyses." Hmmmm.

"Mammogram Studies Leave Future Unclear"

by Judy Foreman, The Boston Globe, January 29, 2002, p. D1.

Years of debate have failed to produce consensus on the value of mammograms -- especially for women under 50. A letter published last fall in the British medical journal The Lancet (Olsen, O. and Gøtzsche, P. (2001). "Cochrane review on screening for breast cancer with mammography," Lancet, 358, 1340-1342) questioned the data on which current screening policies are based. The letter argued that there is no reliable evidence that mammography reduces breast cancer mortality.

That conclusion was based on a review of seven major studies on mammography. In the authors' judgment, the five studies that showed mammograms to be beneficial contained flaws that made them unreliable. The other two studies were reasonably carried out, but these did not show much benefit for screening. The Globe article summarizes the kinds of problems that were identified. Some of the studies were not properly blinded; the doctors who assigned cause of death may have known which subjects were in the screening group. Elsewhere, there were concerns that pre-existing cancers may have been handled differently for women in the screening and control groups.

The National Cancer Institute (NCI) has stated that it will maintain its current recommendations. In particular, they continue to believe that women in their 40s should have mammograms every one to two years. While acknowledging the need for careful review of existing data, the NCI emphasizes the need for further research on early detection of breast cancer, noting that several studies are now underway. You can find more details online in the NCI press release.

The Globe article echoes the concern that we need to move beyond the same old data that have already been debated for many years. Fran Visco of the Washington-based National Breast Cancer Coalition is quoted as saying "The evidence behind screening mammography is poor, certainly for women under 50. For too long, we feel, mammography has taken up too much space in the world of breast cancer." A statement recently released by the coalition observes that: "In any age group, mortality reduction associated with mammography is less than 50 percent. Although it may be difficult to accept, it is vital that women know the truth."

Still another point of view was expressed in a February 2 letter to The Lancet (Miettinen, et al. (2002), "Mammographic screening: no reliable supporting evidence?" Lancet, 359, 404-406). The authors argue that the Malmö study, one of the two accepted as valid in the earlier letter, actually found substantial reduction in breast-cancer mortality after a 6-year delay. They say that such a delay is to be expected since the presumed benefits of mammography arise from early detection.

William P. Peterson
Department of Mathematics and Computer Science
Middlebury College
Middlebury, VT 05753-6145
USA

wpeterson@middlebury.edu