# A Program for Visualizing Comparisons Between Two Normal Distributions

Kieran Mathieson, David P. Doane, and Ronald L. Tracy
Oakland University

Journal of Statistics Education v.3, n.1 (1995)

Copyright (c) 1995 by Kieran Mathieson, David P. Doane, and Ronald L. Tracy, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Restrictions. There is no warranty for this program for any purpose and no guarantee that it is free from error. The public is granted a license to use the program for non-commercial purposes. It may be freely distributed, as long as it is not altered in any way.

Requirements. An IBM-compatible computer running Windows is required; a 486DX/33 or better is recommended.

Key Words: Statistics education; Computer-assisted instruction; Animation; Color graphics; Windows.

## Abstract

This paper describes one program in the Teaching Statistics Visually (TSV) project. TSV supports inductive learning in introductory undergraduate applied statistics courses. The program (1) helps teach concepts rather than analyze data, (2) focuses on one module in a statistics course, (3) relies on visualization rather than formulas, (4) is easy to use, (5) is flexible, supporting different learning levels, and (6) is easy to manage, requiring commonly available resources and incorporating special features to simplify classroom use. A prototype version of the program "Comparing Two Normal Distributions" is included with this paper. The reader is invited to experiment with the program and to send comments and suggestions for improvement to the authors.

# 1. Introduction

1 A good understanding of statistics is important in an increasingly complex world (Roberts 1990). Universities recognize the importance of statistics, with many disciplines requiring at least one statistics course. Statistical concepts, however, are not easy to teach or to learn. Many students dread the required undergraduate statistics course, and they struggle to deal with unfamiliar concepts described in a language of probabilities, standard deviations, and confidence intervals.

2 Instructors teaching introductory undergraduate applied statistics courses have embraced the computer as a tool for communicating statistical concepts (Easton, Roberts, and Tiao 1988; Thisted and Velleman 1992). Many instructors require students to use a statistical package, such as SAS or Minitab. These systems can be valuable when used in the right way. For example, statistical concepts become more concrete when students use a package to analyze real data sets (Roberts 1987).

3 Computer support for teaching statistics is not, however, as good as it could be. Some statistical packages are designed for educational settings. While they are valuable tools, they are not designed to illustrate statistical concepts. This point is discussed in the next section. There are also traditional computer-assisted instruction (CAI) systems which lead the student through packaged lessons. These systems tend to put the computer in control of the student, rather than vice versa (Szabo and Montgomerie 1992), and often have limited impact on student performance (Schmitt 1990). Few systems, however, are designed to help instructors communicate statistical concepts in the classroom. Note that the objective is not to replace instructors in any way, but to improve their performance by creating tools that directly support their classroom activities.

4 This paper describes one program in the Teaching Statistics Visually (TSV) program suite. TSV is an NSF-funded project that seeks to develop a set of programs for use in introductory undergraduate applied statistics classes. Unlike packages such as Minitab or SAS, TSV is designed specifically to help teach statistics, rather than to solve statistical problems. A prototype version of the program is included with this paper. The reader is invited to experiment with the program and to send comments and suggestions for improvements to the authors.

5 The next section gives an overview of the program's objectives. A discussion of one of the programs follows.

# 2. Program Objectives

6 This section summarizes the objectives of the TSV programs. The philosophy underlying the project is not discussed here. Instead, the focus is on describing the general characteristics of TSV programs, of which the program introduced later is an example.

7 TSV is a set of programs designed to help instructors teach statistical concepts in introductory undergraduate applied statistics courses. Each program can be used in class by the instructor, or, with an appropriate set of exercises, by students outside of class. If development funds become available, there will be a TSV program for every module of the typical introductory undergraduate applied statistics course.

8 TSV programs are characterized by the following attributes. First, the programs directly support statistical instruction rather than data analysis. For example, suppose an instructor wants to illustrate skewness. Using Minitab, the instructor could create several data sets with different distributions and write a macro program to display them one after the other. While this approach works, it is not optimal, simply because Minitab is designed primarily to analyze data. The process (1) may force the instructor to talk about concepts that have not yet been covered in the course (such as specific types of distributions), (2) requires some computing sophistication on the part of the instructor, and (3) takes time. TSV simplifies the instructor's task since it was designed to illustrate statistical concepts.

9 Second, each program focuses on a specific topic or group of related topics in statistics. The amount of knowledge that learners must have before they can use each program is minimized. For example, the population description program illustrates concepts such as skewness and peakedness without using numbers or equations. This gives instructors flexibility in the order in which they may use TSV programs in a course.

10 Third, TSV relies on data visualization rather than formulas. The value of visualization (Cleveland 1993; Cleveland and McGill 1988; Levkowitz 1991) and graphics ( Tukey 1990; Velleman and Pratt 1989; Whittaker 1990) have been well-established. Every TSV program uses at least one graphical display. Some programs allow the learner to view several graphs at the same time, each one illustrating a different situation. The learner can compare the graphs directly, rather than having to remember one while viewing another. Some TSV programs use simple animation to show the effect of changes in a parameter (such as the mean of a distribution). In every program, the emphasis is on helping students obtain an intuitive understanding of the relationships between statistical constructs, rather than helping them simply apply formulas.

11 Fourth, the programs are easy to use, a factor that is important in technology adoption (Mathieson 1991). A design objective for each program was that the average user should be able to determine how to use the program simply by looking at the opening screen. TSV runs under Microsoft Windows, minimizing typing through extensive use of a PC's pointing device, such as a mouse or a trackball. The programs adhere to Microsoft's user interface standards (Microsoft 1991), so an individual familiar with another application compliant with this standard (such as Word or Excel) should be able to use them with little difficulty. A complete context-sensitive, hypertext help system is provided with every program. The programs' help files include statistical formulas where appropriate.

12 Fifth, TSV programs are flexible, supporting different learning levels. Even learners at the simplest level can conduct powerful experiments. They can choose more advanced options as they gain experience. This flexibility also supports variability in course design, allowing instructors to skip some topics, emphasize certain topics more than others, or cover topics in different sequences. The programs are designed, as much as possible, to allow flexibility in order and emphasis.

13 Finally, TSV programs impose few operational constraints. They operate on any PC that can run Windows, although a 486DX/33 or better is recommended. There are program features that help address the operational problems of using software in a classroom. For example, most schools have limited facilities for displaying video output during lectures. While some universities have dedicated teaching labs with a PC at every seat, many have only a PC on a cart and an LCD projection panel mounted on an overhead projector. Students at the back of a room find it difficult to see small images when such equipment is used. Therefore, some TSV programs have a magnify button that enlarges an image to fill the entire screen.

# 3. Comparing Two Normal Distributions

14 This section discusses a program titled "Comparing Two Normal Distributions." There is no warranty for this program for any purpose and no guarantee that it is free from error. The public is granted a license to use the program for non-commercial purposes. It may be freely distributed, as long as it is not altered in any way.

## 3.1 Installing and Running the Program

15 The program is supplied in the self-extracting file 2dst-zip.exe. Installation is a two-step process. The first step generates a setup program, much like those supplied with other Windows applications. The second step is to run the setup program. It will install the application to a subdirectory and create a program group in the Program Manager with an appropriate icon.

16 First, copy the file 2dst-zip.exe to its own subdirectory or diskette. Then run it, either from the DOS prompt or using the Windows File Manager. 2dst-zip.exe creates all of the components of the Windows installation program in its subdirectory (or on its diskette). For example, if you copy the program to C:\TEMP1 and run it, it will create the Windows setup program (SETUP.EXE) and its support files in C:\TEMP1.

17 Second, run the SETUP.EXE program generated in the first step. Do this from within Windows, since SETUP.EXE is a Windows program. The easiest method is to double click on the program's icon in the File Manager. You will then be asked to specify the subdirectory in which to store the application. Accept the default subdirectory (C:\TSV) or enter your own, but be sure to install it to a different subdirectory from that containing the SETUP.EXE program. A program group will be created for you in the Program Manager. Double click on the icon in the program group to start the application.

## 3.2 Using the Program

18 This section provides scenarios showing how the program can be used. Note that these are scenarios only. The program is very flexible, and can be used in many other ways. The reader is encouraged to experiment. The best way to discover what a button does is to press it and note the effects. The help file supplied with the program describes all of the available options.

## 3.2.1 Shape of the Normal Distribution

19 The program starts by showing two distributions with two sets of scroll bars. The gray scroll bars control the gray distribution, while the blue scroll bars control the blue distribution. Changing the distributions' parameters with the scroll bars and pressing the Update button will show a new set of distributions. The effect of the changes can be seen immediately.

20 The program can be used to explain, for example, that changing the mean of a distribution moves the distribution along the X axis without changing its shape. An instructor can demonstrate this point without generating data sets or typing any commands. This is an example of how TSV helps instructors explain concepts without becoming mired in the details of computing.

## 3.2.2 Comparing Two Samples

21 A user might select [Options | Show Sampling Distribution] (that is, select Options from the main menu and select Show Sampling Distribution from the Options menu) and [Options | Show t test] and then press the Update button. The parameters change to show that the program is displaying sampling distributions. A sample size parameter appears, and the t statistic is shown.

22 Suppose an instructor wants to illustrate the effect of sample size on significance. This can be done by changing the sample size using the up and down arrows, pressing the Update button and watching the statistics change. Note that the sampling distribution becomes narrower as the sample size increases, consistent with the Central Limit Theorem.

23 These are only some ideas for using the program, and the reader is invited to experiment freely. In fact, given its visual nature and intuitive interface, the reader might learn more from simply experimenting than from this textual description of the program.

24 The reader might also use the program in ways that the developer had not anticipated. For example, one individual used the program to view tail areas by setting the variances of the two distributions to 1, setting the axis location so that the smallest value is 2, 3, or 4 standard deviations above zero, moving the smaller mean beyond the lower axis limits so that only the right hand tail shows, and moving the larger mean so that the same tail area is hidden beyond the upper limit of the axis.

## 3.2.3 Viewing Options

25 Much of the complexity of the program is in the viewing options. They fall into four categories: (1) axis control, (2) multi-window control, (3) magnification, and (4) animation.

26 Axis Control
The user can control the ranges of the X and Y axes. They can be controlled manually, or the user can ask the program to adjust the display. The auto axis button (its icon is a set of axes and three A's) instructs the program to choose values that will display both distributions at a reasonable size.

27 Choosing [Axes | Auto Axes on Update] from the menu causes the program to adjust the axes each time the graph is updated. The advantage of this is that the display will always contain both distributions. The disadvantage is that there will be some sudden changes in the axis scales that students might not notice unless they are pointed out. The instructor can choose the options that are appropriate for the particular situation.

28 Multi-Window Control
Each graph and its control set is displayed with its own subwindow, called a child window. The program is a multiple-document interface (MDI) application. That means it can display more than one child window at a time. This allows the instructor to place two or more graphs on the screen and discuss the differences between them. Try [Window | New Default Window] and [Window | Copy Current Window]. Each child window can be given a unique name with [Options | Change Title]. As with all MDI applications, although more than one child window can be displayed at one time, only one is active at any one time.

29 Magnification
The magnify button (its icon is a magnifying glass) helps the instructor display graphs in less than optimal conditions, such as when using a projection panel in a large room. The button causes the current child window to capture the parent window's entire space. The graph is then magnified to occupy the child window's entire space. The disadvantages of this mode are that (1) only one graph can be displayed at a time and (2) the controls sometimes overlay the upper right hand corner of the graph. Note that the instructor still has the ability to create more than one child window and switch from one to the other.

30 Animation
The [Options | Auto Update] menu selection ensures that any changes to the controls will take effect immediately, without pressing the Update button (in fact, the Update button vanishes from the screen). The instructor can use this feature to animate the display, allowing lively experimentation and rapid replication (the "experimental" side of statistics envisioned by Thisted and Velleman 1992, p. 48). For example, by turning auto update on and moving a mean using one of the arrows on a scroll bar, the corresponding distribution will move along the axis. How quickly this happens depends on the speed of the machine on which the program is running. Instructors can decide whether they want to use this feature on a slow machine.

31 Again, the reader is invited to explore. Selecting the correct viewing options for a particular situation will be easier if the user has experimented with all of the options first.

## 3.2.4 Copying the Graph

32 The [Edit | Copy Graph] menu selection will copy the graph in the active child window to the clipboard. It can be pasted into any application that will accept graphics, such as Word for Windows. Students can use this feature to write reports. Instructors can use this feature to write assignments and exams.

# 4. Conclusion

33 TSV supports inductive learning in introductory undergraduate applied statistics courses. Instructors can use TSV to explain concepts in class. Students can investigate statistical ideas in a friendly, graphical environment.

34 The project is in its infancy. Six programs have been developed besides the one in this paper. They are (1) Displaying and Describing Populations, (2) The Power Curve, (3) Describing Discrete Distributions, (4) Statistical Quality Control, (5) Displaying and Describing Regression, and (6) Time Series. Our plan to develop modules to cover two 3-credit-hour statistics courses will depend on available funding.

35 Your comments on this TSV program are welcome. Please send them to Kieran Mathieson at mathieso@vela.acs.oakland.edu on Internet.

## Acknowledgements

This research was supported by the National Science Foundation (DUE #9254182).

# References

Cleveland, W. S. (1993), Visualizing Data, Murray Hill, New Jersey: Bell Laboratories.

Cleveland, W. S., and McGill, M. E. (1988), Dynamic Graphics for Statistics, Pacific Grove, California: Wadsworth and Brooks Cole.

Easton, G., Roberts, H. V., and Tiao, G. C. (1988), "Making Statistics More Effective in Schools of Business," Journal of Business and Economic Statistics, 6, 247-260.

Levkowitz, H. (1991), "Exploratory Data Visualization: The Human Visual System Should be the Main Design Consideration," in Proceedings of the Statistical Graphics Section, American Statistical Association, pp. 60-63.

Mathieson, K. (1991), "Predicting User Intentions: Comparing the Technology Acceptance Model with the Theory of Planned Behavior," Information Systems Research, 2, 173-191.

Microsoft (1991), The Windows Interface: An Application Design Guide, Redmond, Washington: Microsoft Press.

Roberts, H. V. (1987), "Data Analysis for Managers", The American Statistician, 41, 270-278.

Roberts, H. V. (1990), "Applications in Business and Economic Statistics: Some Personal Views," Statistical Science, 5, 372-391.

Schmitt, D. R. (1990), "Can CAI be More Effective for Teaching Johnny Than Traditional Instruction - Why Have Studies Been Inconclusive?," in Proceedings of Selected Paper Presentations, Convention of the Association for Educational Communications and Technology, ERIC Document ED323947, pp. 1-19.

Szabo, M., and Montgomerie, T. C. (1992), "Two Decades of Research on Computer-Managed Instruction," Journal of Research in Computing in Education, 25, 113-133.

Thisted, R. A., and Velleman, P. F. (1992), "Computers and Modern Statistics," in Perspectives on Contemporary Statistics, MAA Notes No. 21, Washington: Mathematical Association of America, pp. 41-53.

Tukey, J. W. (1990), "Data-Based Graphics: Visual Display in the Decades to Come," Statistical Science, 5, 327-339.

Velleman, P. F., and Pratt. P. (1989), "A Graphical Interface for Data Analysis," Journal of Statistical Computation and Simulation, 32, 223-228.

Whittaker, J. (1990), Graphical Methods in Applied Multivariate Analysis, Chichester: John Wiley.

Kieran Mathieson

mathieso@vela.acs.oakland.edu
David P. Doane
doane@jupiter.acs.oakland.edu
Ronald L. Tracy
tracy@vela.acs.oakland.edu