# Teaching Statistical Concepts With Student-Specific Datasets

Timothy S. Vaughan
University of Wisconsin - Eau Claire

Journal of Statistics Education Volume 11, Number 1 (2003), ww2.amstat.org/publications/jse/v11n1/vaughan.html

Copyright © 2003 by Timothy S. Vaughan, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Key Words: Sampling distribution; Sampling variability; Student simulation.

## Abstract

The advent of electronic communication between students and teachers facilitates a number of new techniques in the teaching of statistics. This article presents the author’s experiences with providing each student in a large, multi-section class with a unique dataset for homework and in-class exercises throughout the semester. Each student’s sample is pseudo-randomly generated from the same underlying distribution (in the case of hypothesis tests and confidence intervals involving ), or the same underlying linear relationship (in the case of simple linear regression). This approach initially leads students to identify with their individual summary statistics, test results, and fitted models, as “the answer” they would have come up with in an applied setting, while subsequently forcing them to recognize their answers as representing a single observation from some larger sampling distribution.

## 1. Introduction

There are certain fundamental statistical concepts that are notoriously difficult for students to truly comprehend at an intuitive level, and instructors are continuously exploring innovative teaching practices in the interest of rectifying this situation. In particular, a number of authors have reported their experiences in engaging students in simulation exercises, designed to convey the concepts of sampling distributions and sampling variability.

These exercises generally fall into one of two categories. The first category represents those exercises in which students are physically engaged in the process of drawing random samples, using sampling bowls, bags of candy, and slips of “yes” or “no” votes (see Schwarz and Sutherland 1997; Dyck and Gee 1998; Rossman and Chance 1999; Gourgey 2000). This approach has the benefit of creating a “real” experiment, and actively involves the student in the sampling process itself. The drawbacks to this approach are that the exercise is generally limited to binomial or multinomial sampling from finite populations, while the sample size and number of samples are inherently constrained by the time available. Variations on this idea include having students prepare subjective confidence interval estimates (Anderson-Cook 1999), while Zerbolio (1989) has suggested exercises in which students “imagine” a physical sampling experiment rather than actually conducting one.

The second category represents those exercises in which computer software such as Fathom is used to generate pseudo-random observations, thus allowing students to see the resulting sampling distribution of various summary or test statistics (Schwarz and Sutherland 1997; Anderson-Cook 1999; delMas, Garfield, and Chance 1999; Rossman and Chance 1999). This obviously allows any number of samples of any size to be “drawn” from a much broader collection of underlying distributions. The downside here is that the student is often placed in the role of passive observer, basically “watching” the demonstration as they experiment with different sample sizes and alternative source populations.

While both types of exercises help to instill a distinction between the distribution of the data as opposed to the distribution of the sample statistic, students frequently fail to make the connection between these demonstrations and subsequent course topics (Gourgey 2000). Indeed, we instructors frequently defeat our own best efforts, first delivering a compelling demonstration of sampling distributions, then proceeding through the remainder of the course with a succession of homework and case projects aimed at students coming to the “right answer” for the data at hand. This reinforces the students’ tendency to think of the results of their homework and in-class exercises as “the right answer,” rather than as “an answer,” representing a single observation from the sampling distribution in question. If the implications of sampling variability are both fundamental to an understanding of statistics and difficult for students to understand, common sense pedagogy would suggest that this concept be continually reinforced, through integration into all topics to which the concept applies.

## 2. Student-Specific Datasets

The author has recently experimented with providing each student in a large, mass lecture or multi-section class with a unique dataset for selected homework and in-class exercises throughout the semester. Within the context of a hypothetical study or research question, random observations are generated from a distribution with given parameters. After covering the material in question with a separate example, students are required to perform the appropriate analysis on their unique dataset, and to return their answers to the instructor via e-mail. Students thus recognize their answers as the conclusions they would have drawn, had they performed the study with their individual data. Subsequent in-class examination of all students’ answers clearly demonstrates the variability resulting from the simple fact that they all worked with different random samples drawn from the same “population.”

This approach is facilitated by an Excel macro written in Microsoft Visual Basic. The macro was initially developed by the author to allow e-mail distribution of confidential student grade reports in a large mass lecture or multi-section course. The macro references the instructor’s spreadsheet “gradebook,” and translates each row of data (one row per student) into a formatted report that is sent directly to each student’s e-mail address.

The author subsequently realized that this same macro can facilitate a unique development in the teaching of basic statistical concepts, easily generating and disseminating unique random samples to a large number of students throughout the semester. The data are sent via e-mail as text file attachments, which the students subsequently copy and paste into Microsoft Excel, or another spreadsheet or statistical package of their choice. The remainder of this paper will describe a number of assignments and exercises exploiting this capability. These procedures have been used in a second-semester statistics course, both to review concepts from the introductory course, as well as to reinforce the concept of sampling distributions within the context of more advanced material.

## 3. An Analysis of Heating Bills

The scenario for the first ongoing example is a study of January 20XX heating bills. Obviously, any scenario of interest could be developed. For each student, I generate a unique random sample of n = 25 observations from a normal distribution with mean = 135 and standard deviation = 20. In class, the students are told they are attempting to estimate and for all January 20XX heating bills statewide. I make it clear that each student has received randomly drawn observations, corresponding to the idea that in practice they would have randomly chosen n = 25 houses from which to collect data. The first assignment is to compute the sample mean , sample variance s2, and sample standard deviation s using Excel, and to return their answers to the instructor via e-mail.

In the following class period, I reveal the “true” = 135 and = 20, emphatically repeating that in practice these values would not be known. In a class of 144 students across all sections, I have generated and sent a total of 144 x 25 = 3600 observations. This provides an excellent opportunity to review some basic normal distribution concepts, demonstrating that approximately 90% of the observations have fallen within and approximately 99% have fallen within .

### 3.1 Summary Statistics

We then observe a histogram of all the values the students have computed, and students are encouraged to identify where “their” has fallen relative to the rest of the class. The compelling lesson from this display is, of course, that is a random variable. Each student is then forced to realize that “their answer” for the homework is just one observation from the “population” of all possible ’s. I have previously computed the grand mean of all the students ’s, which motivates the discussion that . The slight variation between the grand mean and also presents the opportunity to discuss sample size considerations.

I have also previously computed the sample variance of the students’ values, which of course launches discussion of the fact that . This idea is reinforced by demonstrating that approximately 90% of all the students’ values have fallen within and approximately 99% of the values have fallen within . At this point, a show of hands identifying which students’ ’s fell outside the respective ranges is especially compelling. Lest the lesson impact only those students, it is important to emphasize that “it could have been any one of you, the fact that it happened to be these particular students is purely due to which 25 houses they randomly selected for their sample.” It is also useful to compare this analysis, side-by-side, to the earlier analysis of the 90% and 99% ranges for the 3600 individual observations.

The analysis of the 90% and 99% ranges for , obviously requires discussion of the Central Limit Theorem as well. In this case, normality of is supported by the fact that the heating bills themselves are normally distributed. (It would of course be possible to use this approach with non-normal data, demonstrating the degree of normality for under alternative sample sizes. This issue, I believe, is best dealt with using one of the software-based demonstrations discussed earlier. The individual heating bill observations were drawn from the normal distribution in order to provide compelling demonstrations of subsequent analyses based on the t-distribution.)

Finally, we observe a histogram of the students’ s2 values, similarly demonstrating the important idea that S2 is a random variable, and . We also observe the corresponding histogram of the students’ sample standard deviations, but I try to avoid getting into a discussion of the fact that .

### 3.2 Hypothesis Tests

The second assignment based on this data is to have each student compute and e-mail “their” values for (that is, a z-statistic for their value) and (that is, a t-statistic for their and s values). When making the assignment, I emphasize that in the first calculation they are all dividing their by the same constant , while in the latter calculation, each student has their observation of the random variable in the denominator. Students are encouraged to anticipate which values will demonstrate greater variability across the class.

In the following class period, we review histograms of the students’ z and t values. The first point made is that the histogram of “z-scores” is identical to the histogram of values examined in the earlier class period, except for a change of location and scale on the horizontal axis. Superimposed on the histogram is an appropriately scaled diagram of the standard normal density function. This point is reinforced by a show of hands from all students whose values fell outside the range , followed by a show of hands from all students whose values for fell outside the range 1.645. It is, of course, the exact same set of students.

A simultaneous display of students’ z and t values demonstrates that the t-values are indeed more variable. A histogram of the students’ t values is displayed in Figure 1. After introducing the t-table and discussing degrees of freedom, I ask for a show of hands for all students whose value of falls outside the range 1.711 (the appropriate t-value for n - 1 = 24 degrees of freedom, 0.10 two-tail probability). Not surprisingly, approximately 10% of all students have t-values falling outside this range. More importantly, the show of hands demonstrates that it is not the same set of students whose z-values fell outside 1.645. At this point (and as a prelude to the hypothesis testing material to follow), I emphasize that the students’ calculations have collectively formed the central t-distribution created only because I provided them with the correct “true” value = 135 to use in their calculations, which in practice would not be known.

Figure 1. Histogram of 144 students’ test statistics , where is the mean of n = 25 observations, pseudo-randomly generated from the normal distribution with mean = 135 and standard deviation = 20. Histogram is plotted against the t distribution with n - 1 = 24 degrees of freedom. Figure demonstrates the behavior of the test statistic when , is true.

After working through a separate example introducing the hypothesis testing framework, the next assignment requires students to compute and return the values of and associated with their respective samples. The first statistic is appropriate for testing the null hypothesis versus either a one-sided or two-sided alternative, while the second statistic is appropriate for testing the null hypothesis , versus either a one-sided or two-sided alternative. At this point, I relate that we typically wouldn’t draw a single sample for the purpose of testing a variety of null hypotheses. For convenience and clarity, we are first going to pretend we drew the sample for the purpose of testing , and then separately pretend we drew the sample for the purpose of testing . As by now the class is well aware that the true mean is = 135, (I persistently remind them that in practice they wouldn’t know that), we are able to observe the behavior of the test statistics when the null hypothesis is true, as opposed to when the null hypothesis is false.

In order to generate an observable number of Type I errors, we first test at significance level = 0.10. We start out with a one-tailed test (), and a show of hands demonstrates that approximately 10% of the class “drew” samples resulting in > 1.318, thus rejecting H0 when in fact H0 is true. (These students are humorously chided for “committing” a Type I error, although the point is again made that the test result is directly a function of which 25 houses they “randomly drew” for the study.)

Repeating the one-tailed test at = 0.0005, (critical t-value = 3.745 with df = 24) none of the students’ samples result in Type I error, yet I point that in a (much) larger class, we would expect to see about 0.05% of the students draw samples resulting in a Type I error. I also note that we control the value of used, and for the moment the students are convinced that a small is generally superior to a larger .

We then turn to the alternate scenario, pretending we drew the sample for the purpose of “proving” > 120, thus attempting to reject the null hypothesis with a one-sided rejection region. For the data generated, approximately 98% of the students are able to reject H0 at the = 0.10 significance level > 1.318. About 2% of the students’ samples result in a Type II error in this case. A review of Figures 1 and 2, depicting histograms of both test statistics plotted against the t24 density function, is especially compelling here.

Figure 2. Histogram of 144 students’ test statistics , where is the mean of n = 25 observations, pseudo-randomly generated from the normal distribution with mean = 135 and standard deviation = 20. Histogram is plotted against the t distribution with n - 1 = 24 degrees of freedom. Figure demonstrates the behavior of the test statistic when is false.

Repeating this one-tailed test with = 0.0005, we observe that about 47% of the students would fail to reject . Knowing that in fact = 135, the students are now forced to revisit their earlier inclination to believe that a small is generally superior to a large . This prompts additional discussion of the trade-offs involving and . Moreover, (given that the students are aware that = 135), this is an excellent time to demonstrate the critical idea that “failing to reject” H0 does not imply H0 is true.

At this point, I have the students speculate as to the distribution of test statistics had they tested . Returning to the histogram of values in Figure 1, we note that we would see a virtually identical picture had we calculated . As such, approximately 90% of the students would have “committed” a Type II error using = 0.10, the point being made that = P(Type II error) depends partly on “how falseH0 actually is.

### 3.3 Confidence Intervals

A similar approach is used to demonstrate the meaning of a confidence interval, having each student compute the lower and upper limits of 90% and 95% confidence intervals based on their data. A histogram of lower and upper confidence limits generated, as well as a show of hands during class, demonstrates that approximately 90% (or 95%) of the students’ datasets result in a confidence interval that covers the true mean, while 10% (or 5%) of the students generate a confidence interval with upper limit < or lower limit > .

## 4. An Analysis of Trade-In Values

This approach has recently been extended to the case of simple linear regression. Here, I generate a random sample of n = 15 x observations from a discrete Uniform(25000, 50000) distribution, with the interpretation that x represents the number of miles on the odometer of a recently traded vehicle. For each x, the spreadsheet computes y = 10000 - 0.10 x + , where is normally distributed with mean zero and standard deviation = 800. Here y is the observed trade-in value for a vehicle with x miles on the odometer.

As before, each student receives his or her own unique dataset via e-mail. After working through a separate example introducing the concept of least squares estimation, students are assigned to fit the model to their data, using the Excel Data Analysis add-in. Students are also told to use their model to make a prediction ( ) of the trade-in value for a vehicle with 40,000 miles on the odometer, and to return the entire analysis to the instructor by attaching their Excel file to a return e-mail.

As before, subsequent in-class review begins with revelation of the “true” parameters = 10,000, = -0.10, and = 640,000. (The description of the process used to generate the data itself goes a long way in helping students understand the underlying assumptions of the simple linear regression model. I emphasize that I followed this procedure in order to generate data that does inherently fit the assumptions of the model, while in practice this is an issue that would ultimately have to be addressed.)

As with the earlier material, histograms of all the students’ and values drive home the point that any statistics computed from random data are themselves random variables. (The values are displayed in Figure 3.) The central tendency of the histograms again demonstrate the concept of unbiased estimation, e.g. and . (It is important that students not identify too closely with the overall mean of their collective and estimates. I have to continually reinforce that in practice, they would be looking at their single value for and , i.e. one random observation from the respective distributions of all such values.) A histogram of the students’ predicted values at x = 40,000 (Figure 4) delivers a similar message with respect to .

Figure 3. Histogram of students’ fitted parameters, when each student fit the model to n = 15 (x, y) observations drawn from y = 10000 - 0.10 x + . is normally distributed with mean 0 and standard deviation = 800.

Figure 4. Histogram of students’ predicted values = + x at x = 40,000. (E[ y | x = 40,000 ] = 10,000 - 0.10 (40,000) = 6,000).

A histogram of all students’ values for serves as a backdrop for the discussion that the statistic follows a t distribution with n - 2 degrees of freedom, providing the foundation for testing using the statistic . (I do not actually assign the calculation of these values, but rather point out their location on the Excel output report, followed by discussion of the analogy between in the earlier material and in the present analysis.) The histogram of students’ values provides the backdrop for identification of those students whose samples result in a Type II error for the test versus , e.g. those students with p-values on their Excel output greater than various .

## 5. Class Management

The effectiveness of the technique described above lies in getting the students to identify with “their” answers as the results they would have come up with in practice, based on an analysis of “their” data. It is important, then, to maintain the illusion that I am directly using the students’ homework submissions as we review the various sampling distribution properties. In actuality, I have prepared the various summary statistics and displays prior to sending out the student-specific datasets. When reviewing the various results I tell them I have supplied the correct answer for any student who has done the homework incorrectly.

Handling the large volume of e-mail homework submissions is simplified by creating separate “inbox folders” for each assignment, and applying rules that direct any message with a certain key phrase in the subject to the appropriate folder. I have also found that I am able to quickly check the emails as they trickle in, responding to questions or incorrect answers in a more timely manner than traditional paper-based assignment collection.

## 6. Conclusion

In summary, the author has explored the idea of engaging students in demonstrations of the sampling distributions pertinent to various topics throughout the course. This is accomplished by providing each student with a unique dataset for homework problems and in-class exercises. In addition to demonstrating the characteristics of the sampling distribution in question, this approach forces students to recognize “their results” as being a single observation from that distribution. As such, every student directly experiences the implications of sampling variability as it applies to each new topic. Although no objective measurement of the effectiveness of this approach under controlled conditions has been attempted, student response has generally been positive.

This technique could obviously be applied to any topical coverage in which the random nature of data is a concern. The author intends to next extend the approach to coverage of various nonparametric statistical tests, which easily degenerates into a “cookbook” approach without an understanding of the behavior of the relevant test statistics.