Journal of Statistics Education v.3, n.1 (1995)

Joan B. Garfield

General College

University of Minnesota

140 Appleby Hall

128 Pleasant St. S.E.

Minneapolis, MN 55455

612-625-0337*jbg@vx.cis.umn.edu*

J. Laurie Snell

Department of Mathematics and Computing

Dartmouth
College

Hanover, NH 03755-1890

603-646-2951
*jlsnell@dartmouth.edu*

This column features "bits" of information sampled from a variety of sources that may be of interest to teachers of statistics. Joan abstracts information from the literature on teaching and learning statistics, while Laurie summarizes articles from the news and other media that may be used with students to provoke discussions or serve as a basis for classroom activities or student projects. We realize that due to limitations in the literature we have access to and time to review, we may overlook some potential articles for this column, and therefore encourage you to send us your reviews and suggestions for abstracts.

The first meeting of the IASE, a division of the International Statistical Institute, was held in Perugia, Italy, in 1993. This book contains all invited and contributed papers presented at that meeting. Chapters are organized into sections corresponding to sessions of the meeting. These sections include: Statistical Education at School Level (with papers focusing on educational programs in Europe and the USA); Teaching Probability and Statistics at University Level (including papers on issues involved in teaching engineering students, providing off-campus statistical experiences for undergraduates, teaching students in schools of business, and teaching biostatistics to medical students); Computers, Video and Other Tools in the Teaching of Probability and Statistics (with papers on the uses of electronic communication, video, and curricular software to teach probability and statistics); Education Programmes and Training in Statistics (with papers on the role of consultants, training government statistical staff in developing countries, and teaching medical professionals); and Issues in the Teaching of Probability and Statistics (including papers on intuitive strategies for teaching, use of simulations, goals of statistics education, and student views of effective learning). In addition, there is a paper on the IASE and problems of statistics education in developing countries, a report on Updating Teaching Methods in Probability and Statistics (a session for Italian school teachers), and a set of abstracts for posters presented in a poster session.

The April 1995 issue of the International Statistical Review features a collection of papers on statistics education assembled by guest editor David Moore. The following four papers appear in this special issue.

This paper provides a historical perspective on developments in statistics education related to activities of the International Statistical Institute (ISI) over the last 50 years. The early work of the ISI Education Committee is described, beginning with its formation in 1948 and continuing with a variety of activities relating to teaching and training teachers, publishing materials, and sponsoring of conferences on statistics education. Vere-Jones identifies some underlying factors contributing to the rapid growth of interest in statistics education during the past 20 years, describes the establishment of the IASE (in 1991), and outlines challenges this organization faces now and in the future. One of these issues (and perhaps the most difficult one) deals with the training of statisticians in developing countries.

Based on a review of research in the areas of psychology, statistics education, and mathematics education, difficulties that students have learning and understanding statistical concepts are identified. Based on this review of the research literature, principles of learning statistics are formulated, and implications for improved teaching are suggested.

Ideas of Total Quality Management (TQM), a management philosophy originally used to improve the quality of a business, are related to the need for improving the quality of higher education. The authors describe the application of methods of continuous quality improvement (CQI) to higher education in general. Although suggestions are offered for college administrators and faculty in all disciplines, the methods suggested are particularly relevant for individual teachers of statistics (e.g., improving the quality of instruction and student learning by using student feedback, including team activities in class, and by exploring ways to improve as teachers). An example is provided of how to help students develop an understanding of the idea of quality by doing a data collection activity on their study habits.

This paper also addresses methods of TQM and CQI, this time applying them to the context of improving the quality of a large, multisection, introductory statistics course. The author identifies four important (and related) processes involved in offering such courses: curriculum development, the teaching process, the learning process, and the assessment process. Goals for each process are outlined. A description is provided of the author's experience analyzing these processes, organizing quality meetings, developing guidelines and procedures for the course, and using feedback from students to improve the quality of this statistics course. The paper also addresses the problems of "academic antipathy" to the language of TQM, lack of institutional support, and the challenge of building a team feeling among students and teaching staff.

This is an activity written for high school mathematics teachers that demonstrates the idea of quality control using a hands-on data gathering activity. The example described involves analyzing the variance of baseballs by taking measurements of weight, circumference, and resiliency. Directions are provided for having students work in teams, and guidelines for assessing group projects are offered. The author also suggests similar activities using other products.

When should conditional probability be introduced in the high school mathematics curriculum? Watson argues that ideas of conditional probability can be introduced earlier in the curriculum (as early as eighth grade) by providing students with examples of conditional statements (such as those appearing in the media) and problem contexts involving independence and two-way tables. Conditional statements are viewed as important in developing the logic of conditional probabilities. Advertisements with statements such as "We will give you 50% off the joining fee if you join Dockside Fitness before May 24" appear to be a natural way to introduce the topic and the "if ... then" logic of a conditional statement. Examples of conditional probabilities arising in sporting statistics are also described (e.g., finding a baseball player's probability of making a hit in a road game). Ideas are offered for helping students examine these ideas in different contexts, gather and use data to formulate conditional probability statements, and develop the use of language and interpretation of conditional situations in contexts outside of mathematics.

Drawing on perspectives from several disciplines, this collection of papers presents a broad-ranging view of subjective probability to stimulate a reconceptualization of the basic issues related to this topic. There are four major sections of the book. "Background" provides an overview of the philosophical and statistical foundations of subjective probability (e.g., Probability, Uncertainty, and the Practice of Statistics). "Studies in the Psychological Laboratory" describes the theory and research in cognitive and developmental psychology (e.g., Ambiguous Probabilities and the Paradoxes of Expected Utility). "Accuracy of Probability Judgments" focuses on theories and models that allow assessment of the quality of probability estimates (e.g., Subjective Probability Accuracy Analysis), and "Real World Studies" reviews judgments of subjective probability in decision-making contexts (e.g., The Rationality of Gambling: Gamblers' Conceptions of Probability, Chance and Luck).

Should we be teaching our students how to test hypotheses using tests of significance? The following three papers address the controversy relating to the use and instruction of statistical significance testing.

Cohen's paper reviews problems with the "ritual of null hypothesis significance testing--mechanical dichotomous decisions around a sacred .05 criterion" and misinterpretations that result from this type of testing (e.g., the belief that p-values are the probability that the null hypothesis is false). Instead, he suggests using techniques of exploratory data analysis and graphical methods and placing an emphasis on estimating effect sizes using confidence intervals. Replicating a study is the author's preferred procedure for generalizing results.

This paper analyzes the logic involved in rejecting a null hypothesis and presents a critique of the flawed logical structure of statistical significance tests. The authors explain why this type of testing perseveres (despite frequent criticisms of this approach), describing "profound psychological reasons leading scholars to believe that they cope with the question of chance and minimize their uncertainty via producing a significant result." Research is described that reveals misconceptions held by students as well as researchers, including the belief that a significant result means the null hypothesis is improbable; reasons for this misconception are explored. The authors conclude by offering some alternative methods for presenting and analyzing data.

Taking on the issue of what it means to "accept a null hypothesis," Frick believes that in some situations this is indeed the correct thing to do. Criteria are offered for accepting the null hypothesis (which is distinguished from the decision of "failing to reject" the null hypothesis). The author concludes that the null hypothesis should sometimes be accepted (when the methodology he presents is followed), and that the rules of psychology (or science in general) should be changed to allow the null hypothesis to be accepted.

A regular component of the Teaching Bits Department is a list of articles from Teaching Statistics, an international journal based in England. Brief summaries of the articles are included. In addition to these articles, Teaching Statistics features several regular departments that may be of interest, including Computing Corner, Curriculum Matters, Data Bank, Historical Perspective, Practical Activities, Problem Page, Project Parade, Research Report, Book Reviews, and News and Notes.

The Circulation Manager of Teaching Statistics is Peter Holmes, p.holmes@sheffield.ac.uk, Center for Statistical Education, University of Sheffield, Sheffield S3 7RH, UK.

Summary: This article examines the role of graphical work in data handling in the early years of schooling. A major concern is that the learning of graphing could be reduced to algorithmic learning. An examination of many of the activities suggested for teachers reveals the danger of "algorithmic graphing." In addressing the question of what aspects of graphing need specific emphasis, the author describes three components needed to comprehend graphing: the nature of data, alternative representations, and prediction.

Summary: Properties of the random variable representing the number of identical and independent Bernoulli trials necessary to obtain k consecutive successes are investigated. The results are of interest to students in a first course in probability or mathematical statistics.

Summary: Data are presented here for the hundred largest cities in the world. They form part of a case study to teach students about exploratory data analysis but are of added interest in providing a focus on poverty and underdevelopment in the Third World and the contrast between this and the wealth of the First World.

Summary: A comparison of the levels of difficulty experienced by students in the use of Minitab through its menu interface and its command language showed no advantage in using the menus. A sample of 55 first year college students used Version 8.2 of PC Minitab on 286 IBM-compatible computers. One group of these students was taught to use the commands, and a different group was taught to use the menus (without using a mouse; instead using a combination of keys to access the menus). An attitude questionnaire was used to assess student reactions to the different Minitab formats.

Other features in the Spring issue include brief articles in the following departments: Practical Activities, Software Review, Standard Errors, Data Bank, and Research Report.

Max Frankel is former executive editor of the New York Times and now writes a column on communication for the New York Times Magazine. On this occasion, Frankel discusses how poorly newspapers communicate information involving numbers. His examples are taken from the New York Times, though he assures us that the Times is better at dealing with numbers than most newspapers.

Frankel starts with a series of examples with missing denominators; for example, "Clinton has reduced the Federal payroll by 98,000." The reader is not told the total number of people on the Federal payroll, making it difficult to assess the significance of the cut. Frankel comments that "America needs baseball back, if only because that is the only way it learns to handle rates (batting averages), probabilities (who expects to live long enough to see 61 homers again?), and context (how come the team with the lowest wages had the best record before the strike?)."

Frankel gives examples to demonstrate his claim that newspapers sometimes handle numbers in a sloppy way or give a false impression of the precision of the numbers provided. One example is a report in the New York Times of a study that estimated the economic costs of depression in America to be 14.7 billion dollars. Part of this total cost was the loss of earnings of the 18,400 suicides per year, which Frankel describes as "magically rendered as 7.5 billion ($407,608.69 each?)." The original article, in the Journal of American Psychiatry, provided a detailed justification for this figure. It seems to me that the problem stems from having to summarize a lot of information in a short space. Frankel's simpler examples, such as a report that Russian miners demanded a 150% pay increase (in one paragraph) and wanted their salaries increased by two and a half times (in the next), are more convincing.

Frankel discusses several examples from the forthcoming book by John Paulos, "A Mathematician Reads the Newspaper." He admires a conditional probability problem involving a diagnostic test where a positive test implies only a 1-in-11 chance of a patient's actually being sick in a situation where the test can be "rightly called 99 percent accurate." He asks "How many newspapers reporting such a study could correctly instruct their readers in the meaning of 99 percent accurate?"

The next example shows Frankel's own difficulty in being clear about numbers. He writes: "How many of us had the wit to question, as Paulos does, the judgment that proportionately more blacks (95 percent) voted for David Dinkins because he is black than whites (75 percent) voted for Mayor Guiliani because he is white?" Noting that 80% of New York's blacks normally vote Democratic while only 50% of whites normally vote Republican, Paulos says it could also be argued that only 15% of blacks voted racially versus 25% of whites.

Frankel should have just said that 95% of the blacks voted for Dinkins and 75% of the whites voted for Guiliani and then gone on to give Paulos' argument that it is still possible that fewer blacks voted racially than whites.

Gould gives his critique of the book The Bell Curve by Richard J. Herrnstein and Charles Murray. There have been many reviews and commentaries on this book, but this is one of the very few that discuss statistical issues related to the book.

Herrnstein and Murray assume that there is such a thing as "general intelligence" represented by a single number and that I.Q. tests permit us to rank people by general intelligence. An I.Q. test is typically made up of a number of different tests -- verbal, mathematical, etc. The I.Q. score is the sum of the scores on these individual tests. The idea of general intelligence came from the observation that the individual parts of an I.Q. test are typically positively correlated. Factor analysis provides a mathematical technique to represent the total score as a linear combination of a small number of factors. General intelligence is then represented by the factor accounting for the largest amount of the variation in the scores when the test is given to a group of individuals. This factor is called the "principal component."

Gould claims that these calculations do not demonstrate the existence of a single number representing general intelligence. He asserts that most serious workers in the field of intelligence feel, as he does, that if intelligence can be quantified, it will have to be represented by a set of numbers corresponding to different aspects of intelligence. This would make the comparison of groups by intelligence a much more difficult task.

Critics of The Bell Curve have asserted that a major thesis of the book is that I.Q. is largely inherited and, as a result, there is very little that can be done by changing the environment to reduce the 15-point gap in average I.Q. scores between blacks and whites. While the authors do not actually say this, Gould suggests that their treatment of the topic is designed to give readers that impression.

The Bell Curve contains a large number of graphs (resulting from log-normal regressions) devoted to showing that I.Q. is the best predictor of job performance, income level, welfare status, etc. Gould points out that most of the correlations involved are small. In addition, he criticizes the authors for giving only the regression lines and no scatter plots or other indication of variation. (An appendix to The Bell Curve does include more complete regression information.) In Gould's opinion, the conclusions of the The Bell Curve are based on some very basic premises that are simply not true.

Estrogen is taken by millions of American women to treat symptoms of menopause or to prevent osteoporosis. Previous studies have indicated that estrogen can cut the risk of heart disease but increases the risk of breast cancer.

A study of nearly 100,000 women, reported at the American Heart Association meeting in San Antonio, found that the women who had estrogen therapy had a 30% lower risk of dying from all causes than those who had never used estrogen. This result was statistically significant. They also had a 48% lower risk of dying from heart disease, which was nearly statistically significant. The corresponding results for women under 75 who had estrogen therapy for at least ten years were both highly significant.

Jane A. Cauley, who presented the results, pointed out that women who choose estrogen replacement may be more health conscious or more willing to comply with a doctor's advice than other women.

The article remarks that this is a case where the entire set of risk factors has to be taken into account. The fact that heart disease kills about five times as many women as breast cancer suggests that the advantages of estrogen therapy would outweigh the disadvantages. On the other hand, women with a history of breast cancer who exercise regularly and eat a low-fat diet might not choose estrogen therapy because they are at lower risk for heart disease than for breast cancer.

The Office of Management and Budget (OMB) provides the racial and ethnic categories used by federal agencies. These categories are used, for example, in the census, in civil rights enforcement, and in demographic studies. The current categories, established in 1977, are American Indian or Alaskan Native, Asian or Pacific Islander, Black, White, and Hispanic. The OMB is reviewing these categories; a wide range of changes have been recommended to them, including

- Change "Black" to "African American" and "American Indian or Alaskan Native" to "Native American";
- Include "Native Hawaiians" as a separate category or as part of "Native American" rather than as part of the "Asian or Pacific Islander" category; and
- Add a "Multiracial" category to the list of racial designations.

In addition to these specific suggestions, more general suggestions have been made including eliminating racial and ethnic categories altogether since they appear to have no real genetic significance.

These two articles discuss the many issues facing the OMB. Needless to say, a revision of racial categories raises a number of interesting statistical issues such as, "Should race be self-reported?" and "Would important information be lost by introducing a multiracial category?"

Anderson and Fienberg make a number of recommendations which include:

- Do away with the labels 'race' and 'ethnicity' and substitute something like "identified population groups."
- Allow people to identify with more than one group.

Regarding this last recommendation, they remark, "We can always construct statistical rules for taking multiple responses and producing aggregate information on the categories."

These two companion articles are based on Bailar's extensive experience assisting the media in reporting science, particularly in the field of medicine.

In the first article, Bailar stresses that science writers are professionals just as scientists are and that they "have their own interests, professional standards, technical language, time schedule and so forth." Bailar gives his advice, as well as that of an experienced science writer, as to how to provide scientific information to a reporter and, through the reporter, to the public. Much of this advice is very relevant to our own classroom attempts to convey current scientific knowledge.

In his second article, Bailar makes six suggestions for ways the news media can help statisticians do a better job:

- Reporters should stress that the outcome of one study is usually only one part of a general research program and should rarely be interpreted in isolation.
- The media should not give the impression that dead controversies are still alive. For example, they should not allow tobacco companies to suggest that the issue of smoking's danger to health is still being debated.
- When a controversy is not settled, such as the possible dangers of electromagnetic fields, the media should bring out arguments on both sides. A recent article ("Explaining EMF: Science Writers Did It Better" by Sharon M. Friedman, ScienceWriters, Winter 1994-95, 7-8) reviewed a large number of news articles on the possible dangers of electromagnetic fields (EMF) and found that the media, as a whole, did not follow Bailar's advice. The author, however, found three excellent in-depth articles by science writers who accurately explained all of the issues in a balanced way.
- News media should be more skeptical when the pronouncements of individuals or organizations, including the government, tend to serve some ulterior purpose. An example would be the National Cancer Institute announcing dramatic new progress in curing cancer.
- News media should be skeptical about claims that are not backed by extensive data. Bailar mentions the flurry of news suggesting a connection between cellular telephone use and brain cancer.
- The news media should continue to educate the public in the ways that science progresses -- that is, the big picture.

This article discusses the "envelope paradox": You are asked to choose at random one of two envelopes containing money, one of which contains twice as much money as the other. In the envelope chosen you find x dollars. You reason that the other envelope is equally likely to have either 2x or x/2, giving an expected value of 1.25x. This suggests that you should switch to the other envelope, which seems absurd.

The authors point out that to calculate the expected value to see if you should switch, you need to know the prior probability that specific amounts of money were put in the envelopes. Then you should switch if and only if the probability that you have the better envelope is less than 2/3. They prove that this condition must be satisfied for at least one x, and give an example of an prior distribution for which it is always satisfied. In this case you should, indeed, always switch!

The authors point out that the standard use of p-values and confidence intervals to judge the outcome of a medical trial often leads a doctor to decide to use a drug without taking into account information from previous studies. They argue that a Bayesian analysis can be used to take into account previous information and can lead to a different conclusion about the effectiveness of a drug.

The authors illustrate this using a recent study called the GUSTO study that compared two different treatments -- tissue-type plasminogen activator (t-PA) and streptokinase (SK) -- for heart attacks. The study involved 41,021 patients; t-PA had a significantly lower mortality rate (6.3% vs. 7.3%, respectively; p = .001). Since t-PA costs about $2000 and SK only $200, and two similar previous studies indicated no significant difference, some doctors have been hesitant to use the more expensive treatment despite the latest significant result.

A Bayesian analysis is carried out by choosing a prior distribution for the difference in mortality rates for the two drugs. The authors suggest that different people might weight the previous studies differently. They give three prior distributions for the mortality difference corresponding to weighting the previous studies by 10%, 50%, or 100%. For each of these three prior distributions, they obtain the posterior distribution for the difference in the two drugs given the results of the GUSTO trial.

From the posterior distribution for the mortality difference, it is possible to calculate the probability that t-PA has a lower mortality rate than SK. In addition, it is possible to calculate the probability that one drug is "clinically superior." In this example the authors say that a drug is clinically superior if the mortality rate is at least 1% lower.

The 50% weighting of the previous studies leads to a 44% probability that t-PA has a lower mortality rate and a negligible probability that the difference is clinically significant. Ignoring previous results altogether, t-PA has a very high probability (99.95%) of having a lower mortality rate but only a 48% chance of being clinically significant. Thus, a wide range of weightings of the previous results all provide some justification for not immediately switching to the more expensive drug.

In the preface of this new book, Abelson suggests that students learn to do the statistical analysis for a study but do not learn what he calls the "narrative" part of the study. "Ask a student the question, 'If your study were reported in the newspaper, what would the headline be?' and you are likely to receive in response a rare exhibition of incoherent mumblings." Pursuing this headline question, he arrives at the thesis of this book: "The purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric."

The book discusses the many issues involved in making these arguments. The author assumes the reader is familiar with elementary probability, standard tests such as t tests, analysis of variance, and simple issues of research design that might be presented in a first statistics course. The first five chapters review basic statistical concepts in an informal manner with no formulas and always from the point of view of using them to make proper arguments. Chapters 6 through 10 discuss more general topics, such as meta-analysis, again from the point of view of making valid statistical arguments.

In all cases the ideas are illustrated by research studies, mostly taken from Abelson's own field of experimental social psychology. One of Albelson's theses is that problems chosen for study should be interesting. Consistent with this, he chooses interesting examples. Here are three of them.

A study found that the average life expectancy of famous orchestral conductors was 73.4 years, significantly higher than the life expectancy for all males, which was 68.5 years at the time of the study. Jane Brody in her New York Times health column reported that this was thought to be due to arm exercise. J.D. Caroll gave an alternative suggestion, remarking that it was reasonable to assume that a famous orchestra conductor was at least 32 years old. The life expectancy for a 32-year-old male was 72 years, making the 73.4 average much less surprising.

A curious reporter found there was an unexpectedly high number of births on Monday and Tuesday exactly nine months after the famous New England blackout of 1965. He wrote an article suggesting the obvious causal effect. The detective work of a curious statistician found that a similar excess of births on Mondays and Tuesdays is more generally found. Further, he found that doctors prefer to schedule induced labor and Caesarean operations at the beginning of the week, which provides an explanation for the excess births on Mondays and Tuesdays.

In 1968, Rosenthal and Jacobson studied what they called the "Pygmalion effect in the classroom." They told elementary school teachers that certain students in their classes, called "bloomers," had been identified by a special test as being likely to display future excellence. These students were, in fact, randomly chosen. The researchers hypothesized that the bloomers would receive extra attention and do better in future tests. This was verified. One observation was that the mean I.Q. score of the bloomers was 4 to 6 points higher than the control group, and this caused a great deal of controversy. A meta-analysis of 18 further studies did not initially support this I.Q. finding, but further analysis showed it was supported in the studies where the teachers had not had previous contact with the students.

This last example, used to illustrate meta-analysis, has been discussed recently in connection with the book The Bell Curve.

In his preface, Abelson remarks "I have always wanted to write a statistics book, full of tips, wisdom and wit." He has certainly succeeded! On the back cover he remarks that he is not the Robert P. Abelson who sings in the Yiddish theater in New York.

Return to Table of Contents | Return to the JSE Home Page