More Problem Solving Power: Exploiting Prediction Models and Statistical Software in a One-Semester Course

Joe H. Ward, Jr.
University of Texas at San Antonio

Robert L. Fountain
Portland State University

Journal of Statistics Education v.4, n.3 (1996)

Copyright (c) 1996 by Joe H. Ward, Jr., and Robert L. Fountain, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.


Key Words: Linear models; Undergraduate teaching techniques.

Abstract

Many widely-adopted college textbooks that are designed for a student's first (and possibly last) statistics course have incorporated new trends in statistical education, but are organized in a manner that is still driven by a traditional computational, rather than a conceptual, framework. An alternative approach allows for the treatment of many seemingly-unrelated conventional procedures such as one- and two-sample t-tests and analyses of variance and covariance under a unifying prediction model approach. Furthermore, this approach, combined with the power of modern statistical software packages, prepares the student to solve problems beyond the scope of traditional procedures. Students will appreciate the acquisition of practical research capabilities and might even be stimulated to continue their study of statistics.

1. Introduction

1 A traditional one-semester introductory statistics course includes numerous topics, each given a very brief treatment. These topics typically include numerical and graphical data analysis tools, introductions to combinatorics, set theory, and probability, applications of several discrete and continuous probability distributions, sampling distributions of the sample mean and standard deviation, confidence intervals and hypothesis tests for means, proportions, and variances of one and two populations, simple linear regression, analysis of variance, and $\chi^2$ tests of independence. This approach in many instances leaves the students with the inability to attack common problems, having spent so little time with each technique. It also fails to demonstrate to the students that there are many more problems that could be solved using techniques very closely related to those which they have learned; the traditional approach, instead, encourages students to pigeonhole new problems into a few narrow categories.

2 Many widely-adopted statistics texts have incorporated new trends in introductory statistics education but still contain traditional restrictions. For example, some statistic texts require a two-way ANOVA to have equal sample sizes in each cell. This restriction prohibits the student from working with the more practical situations when the cell frequencies are unequal or when the cell category has no observations. Discussions of the confusing results obtained in ``missing cell'' situations can be found in Freund (1980) and Jennings and Ward (1982).

3 This preservation of the traditional (and restrictive) algorithms deprives the student of the opportunity to explore the power of computing-intensive data-analysis models. It is suggested below that a prediction model approach, aided by the widespread availability of computers, can be introduced to the beginning statistics students, empowering them to expand their analysis capabilities to more advanced problems. There have been many stimulating and valuable writings and presentations concerned with improving teaching the introductory statistics course. These emphasize using real data to teach statistical concepts rather than techniques, and suggest keeping the amount of probability to a minimum. Bentley, Cobb, Lock, Moore, Parker, and Witmer (1994) have assembled a valuable collection of student activities and projects designed to encourage educators to expand their concepts of teaching statistics beyond lectures and doing traditional homework problems. Witmer (1992) has designed a textbook supplement to present the tools of analysis in an engaging manner and at a level that is accessible to students in elementary classes. Moore and McCabe (1993) present a refreshing approach by beginning their Introduction to the Practice of Statistics by looking at data distributions and relationships and producing data before discussing formal ideas of probability.

4 Certainly the many valuable activities that have been developed to improve students' understanding and interest in data analysis should be incorporated into an introductory course. However, the introduction of digital computers has long-ago opened the opportunity for beginning students to exploit the analysis power of more computing-intensive procedures.

5 The method suggested here provides beginning students with increased data analysis capability by combining a prediction model approach with high-speed computing power. It possesses several important advantages over traditional methods:

6 The approach is based on earlier publications by Bottenberg and Ward (1963), Ward (1969), Ward and Jennings (1973), and Ward (1991). It has been discussed previously by Fountain and Ward (1993). Experiences with the prediction model approach are presented in Section 5 and some suggestions for implementing the approach in the introductory course are presented in Section 6.

2. The Prediction Model Approach

7 The prediction model method has two aspects. First, it synthesizes many standard procedures within the single framework of the linear model. Second, a problem-solving system is used, in which the students learn to adapt the model to fit the needs of each individual problem with which they are faced. The goals of this approach are:

8 In traditional one-semester statistics courses, so little time usually remains when regression and analysis of variance are presented that these topics are given extremely brief treatments or are omitted entirely. The prediction model approach integrates these topics with one- and two-sample t-tests and allows for expansion to more complex models. Using the same general form of the prediction model and adapting it to meet the needs of each problem, estimation and hypothesis testing may be done for each of the following:

  1. The value of a single population mean (one-sample t-test).
  2. The relationship between two population means (two-sample t-test).
  3. The relationships among means of several populations (one-way ANOVA).
  4. The relationships among means of several populations, controlling for a blocking attribute (randomized complete block design).
  5. The comparison of cell means, allowing for the interaction between two predictor attributes (two-way ANOVA with interaction).
  6. The effect of a single continuous predictor attribute (simple linear regression).
  7. The effects of several continuous predictor attributes (multiple regression).
  8. The effect of a continuous predictor attribute together with a categorical predictor attribute (analysis of covariance, or ANOCOVA).

3. The Unifying Model

9 In this section, we will give precise formulations of the models used in eight types of analyses, showing how they all fit into the unified prediction model approach. The use of Greek letters and subscripts would not be necessary when introducing the concept of prediction models to inexperienced students. Section 6 contains some strategies for the classroom presentation of the material.

10 The general form of the prediction model common to all of the problems listed above is, for i = 1, ..., n:

where Y represents the response being measured, X_1, X_2, ..., X_p are the predictor attributes, $\epsilon$ is an error variable with zero mean and constant variance, and n is the number of observations. For a continuous predictor, X_ij is the value of the jth predictor at the ith observation. For a categorical predictor with k mutually exclusive categories, the model would contain k indicator variables, each taking on the value 1 when the observation falls in that category and 0 otherwise. To include a constant in the model, a predictor is defined that takes on the value 1 for all observations. Even at the introductory level, it can be explained that the coefficients in the prediction equation are chosen so that the differences between the actual and predicted responses are as small as possible. The concept of an error sum of squares as a numerical measure of the total difference is not a difficult one to convey at this point.

11 The students learn to translate stated problems into appropriate models, formulating suitable hypotheses. Some discussion of the general ideas behind hypothesis testing must be introduced, including the interpretation of the conclusions, ``reject H_0'' and ``fail to reject H_0''. The restricted and full models are created to correspond to the null and alternative hypotheses, respectively. Until the point in the course where probability distributions are formally presented, issues concerning violations of the error assumptions can be postponed, and the F statistic can be introduced and interpreted merely as a quantifier of the difference between the full and reduced model error sums of squares:


















Variations on the model

12 The versions of the full and restricted models needed to carry out the eight analyses mentioned earlier are shown below. In some cases, several different hypotheses may be tested, and representative examples have been selected. Throughout the following, U_i = 1 for i = 1, 2, ..., n.

1. One-sample t-test. The full model is:

To test the null hypothesis that the mean of the population equals a specified value $\mu_0$, the restricted model is:

2. Two-sample t-test. The full model is:

where

Notice that this is a ``cell means'' model. That is, the least squares estimates of $\beta_1$ and $\beta_2$ will be the sample means of the dependent variable Y for samples 1 and 2.

To test the null hypothesis that the means of the two populations are equal, the restricted model is:

The least squares estimate of $\beta_0$ will be the overall sample mean of Y.

3. One-way ANOVA. The full model is:

where

This is another cell means model, since no grand mean term appears on the right hand side. The estimates of the $\beta_i$s will again be the sample means of Y for samples 1 through m.

To test the null hypothesis that the means of the populations are equal, the restricted model is:

4. Complete block design. The full model is:

where

Note that the blocking index k runs from 1 to r-1 if there are r blocks. This prevents overparameterization, a situation in which the parameter estimates are not unique. To test the null hypothesis that there is no treatment effect, the restricted model is:

5. Two-way ANOVA. The full model is:

where

Note that, in this parameterization, there are no ``main effect'' and ``interaction'' terms. Instead, the parameters $\gamma_{11}$, ..., $\gamma_{mr}$ are the means of the cells representing specific combinations of the treatments. To test the null hypothesis that there is no interaction between treatments, the restricted model is:

Note that (11) is the same as (8). To test the null hypothesis that there is no main effect for the second treatment, given that there is no interaction, the full model is the previous restricted model (11), and the new restricted model is:

6. Linear regression. The full model is:

To test the null hypothesis that the slope is zero, the restricted model is:

7. Multiple regression. The full model is:

To test the null hypothesis that the coefficients $\beta_{j+1}$, ..., $\beta_r$ equal zero, the restricted model is:

8. ANOCOVA. The full model is:

where

where Z is a quantitative predictor. To test the null hypothesis that the slopes are the same for each level of the treatment, the restricted model is:

To test the null hypothesis that there is no treatment effect, given that the slopes are equal, the full model is the previous restricted model (18), and the new restricted model is:

Other models

13 The eight models presented above represent the most common problems presented in traditional introductory statistics courses. Some of these, such as the randomized complete block design, two-way analysis of variance, multiple regression, and the analysis of covariance, are usually covered in a second-semester statistics course. The prediction model approach allows these topics to be presented at the same time as simple linear regression and one-way analysis of variance. Other subjects such as polynomial regression and linear and polynomial spline models may also be covered using the same method. Even in the context of standard models, students can be encouraged to form conjectures beyond the usual hypotheses that certain coefficients are equal to each other or are equal to zero. For example, within the one-way ANOVA framework, it is possible to test for constant differences between the means at successive levels of the treatment, as is illustrated in the next section. The students, as a result, may grasp the similarities between ANOVA and linear regression. In addition, the missing cells problem is alleviated using the method described. If there are one or more missing cells in a two-way ANOVA (see variation 5 above), then the full model (10) will contain only those binary predictors (and corresponding parameters) associated with cells containing observations. Then the student can state hypotheses of interest in terms of the full model. This assures that the results obtained from the regression model analysis will reflect the hypothesis of interest. And, of most importance, it protects the student from the risk of unknowingly using output of packaged ANOVA procedures that may give results that answer uninteresting questions. This has been discussed by Freund (1980) and Jennings and Ward (1982). Furthermore, this approach allows the student to clearly communicate the hypotheses being tested, so that a reader of the research report knows exactly what is being done. For example, some packaged ANOVA programs may label as a ``main effects'' test one that assumes equally-weighted cell means, while others may label as a ``main effects'' test one that uses weights based upon cell frequencies. This has been discussed by Jennings and Green (1984).

4. Example

14 The following example uses a data set that would be very difficult to analyze using only the methodology presented in a typical introductory statistics course. The purpose of its inclusion here is threefold:

  1. To demonstrate the breadth of questions that can be addressed by using the general prediction model approach,
  2. To illustrate the shortcomings of the traditional approach, and
  3. To show how statistical software can be used to fit the full and reduced models.

15 We will continue the formal presentation in this section, in order to precisely state the hypotheses being tested in response to each of the questions posed. The next two sections contain a more informal approach, with suggested methods of presentation in the classroom. The example uses PROC REG in Version 6 of SAS (1989) to analyze a data set that appears in Glass and Stanley (1970, p. 112) and is shown in Table 1. There are eight equally-spaced age groups, and the measure of performance (Y) is the digit-symbol subtest of the Wechsler Adult Intelligence Scale. The mean scores for each age are plotted in Figure 1. A student in a typical introductory statistics course would have only a one-way ANOVA and a simple linear regression analysis as tools with which to analyze the data. As can be seen from the figure, neither of these would capture the more interesting aspects of the data. We wish to investigate three questions:

  1. Does the mean value of Y differ among the eight age groups? In terms of expected values we can express the hypothesis as

    which can easily be expressed in the SAS TEST statement as shown below.

  2. Is there a constant difference (linear trend) in means? In terms of expected values we can express the hypothesis as

    These restrictions can easily be imposed with the SAS TEST statement.

  3. Is there a constant second-order difference (quadratic trend) in means? In terms of expected values we can express the hypothesis as

    which can be simplified for ease of expression in the SAS statement as

16 The MODEL statement in PROC REG is used to describe the full model, and the TEST statement is used to specify the restrictions. Figure 2 (see Appendix) shows the SAS program that generates the required analyses.


         Table 1:  Digit-symbol subtest scores in 8 age groups

              -------------------------------------------
                          Age to nearest year
              -------------------------------------------
                10   14    18    22    26   30   34   38
              -------------------------------------------
                 7    8     9    11     9    8    7    8
                 8    9    10    11    10    9    9
                 9   10    11    12    11    9   10
                 9   11    12    12         10
                10
              -------------------------------------------
                                 Means
               8.6  9.5  10.5  11.5  10.0  9.0  8.7  8.0
              -------------------------------------------

Figure 1: Mean Subtest Scores by Age

Figure 1 (2.6K GIF)


17 The output of the SAS program is shown in Figure 3 (see Appendix). At the 5% level of significance, the hypothesis of no differences in mean response for the eight age groups would be rejected. The hypothesis of constant differences would be rejected. The hypothesis of constant second-order differences would fail to be rejected. Thus, it appears that there may be a quadratic trend in the mean value of Y across the eight age groups. Using the traditional approach to analysis of variance in introductory courses, with its emphasis on testing for main effects, the preceding analysis would be well beyond the capabilities of the vast majority of students.

18 In variations 5 and 8, discussed in the previous section, several conditional hypothesis tests were mentioned. These may easily be performed in PROC REG by placing constraints on the full model with the RESTRICT statement.

19 Note: While the example shown above used the SAS statistical software, any appropriate software package can be used to carry out the computational requirements for the analyses.

5. Experiences with the Prediction Model Approach

20 There has been a wide range of experiences with various approaches to implementation of the more formal presentation above (THE UNIFYING MODEL).

21 The earliest experiences with the approach began in the 1950s, just as expensive, slow-speed computers became available. The first target group consisted of mostly research psychologists at the Air Forces Personnel Research Laboratory at Lackland Air Force Base. This group had been trained with pre-computer methods of data analysis. Also, at the same time, the approach was presented to a small group of high school students. One of the high school students received recognition from the Westinghouse Talent Search for his paper titled ``A Vector Approach to Statistics''. These experiences formed the basis for ``Applied Multiple Linear Regression`` by Bottenberg and Ward (1963).

22 In 1964 the National Science Foundation sponsored a short course in the use of regression analysis and computers for behavioral scientists at the University of Texas at Austin. This was followed for several years by Presessions at the annual meetings of the American Educational Research Association (AERA). The AERA Special Interest Group (SIG) in Multiple Linear Regression was formed by alumni of these Presessions, and the SIG now has additional members.

23 The first author has conducted short courses using this approach and has included it as part of non-calculus based introductory statistics courses at St. Mary's University of San Antonio, The University of Hawaii, The University of Texas at San Antonio, and Clemson University.

24 In recent years, focus has been directed toward high school students. Since 1991 the first author has tried various strategies for introducing the approach to high school students in grades 9 - 12. Some of these students have received special recognition for their use of statistics in Regional and International science fair projects. In the 1996 American Statistical Association Project Competition, an entry based on this approach received a special prize award ``to the team submitting the statistics project making the best use of a computer''.

25 There have been no formal assessment experiments for either college or high school students to estimate the value of this approach for introductory statistics students. While it would be of interest to compare instruction using the prediction model approach with more traditional courses, it may be difficult. As in many such studies comparing different approaches, it is necessary to evaluate outcomes for learning objectives common to both methods. The prediction model approach, although related to regression and linear models, is usually not among the learning objectives for an introductory statistics course. The approach is suggested as one component of a statistics course which, if properly introduced to each particular audience, should give students the research power not otherwise available.

6. Strategies for Implementing the Approach

26 A prediction model approach can be used not only in an introductory college-level course but also for some high school students. Many high school students have had experience with ``functions without errors,'' and an increasing number of high school students are familiar with ``functions with errors'' (least-squares fitting) ( Foerster 1986). These concepts lead naturally into the prediction model approach (Ward and Foerster 1991).

27 The approach starts with one or more interesting problems and introduces topics to address the problems only as the topics are needed. This means, of course, that probability ideas are not introduced until much later than in some traditional courses. Some courses still approach the subject with techniques that were appropriate before computers. And when the computer is used it is to process the algorithms of pre-computer days. The use of packaged algorithms can lead to disastrous results ( Freund 1980, Jennings and Ward 1982).

28 On the first day we try to show the students that they will be able to do things with the course objectives that they cannot do without the course. We try to get as quickly as possible to the question, ``How do we control for the uncontrollable?'' (Ward and Niland 1994).

29 Students move quickly to the natural language discussion of how to predict a dependent (or response) variable from one attribute, then how to control for a second variable that might confound, or contaminate, the results.

30 Problems of interest should be chosen for the particular audience. Examples of four different real-world problem situations that have been used with various audiences are presented below.

  1. To win a bet as to which basketball player will score more points in the next game.
  2. To predict which business, Sea World of Texas or Sea World of Florida, will earn the most profit next year.
  3. To compare the effectiveness of teachers.
  4. To compare the sizes of babies born to mothers who smoke and mothers who do not smoke.

Then we can discuss some questions of interest:

  1. Is there a difference between the performance of player x and player y?
  2. Is there a difference in profits between Sea World of Texas and Sea World of Florida?
  3. Is there a difference between student performance measures for various teachers?
  4. Is there a difference in birth weights between babies born to mothers who smoke and mothers who do not smoke?

31 From here we brain-storm what might ``bother you'' about these questions. This leads to making a list of variables that might confuse, confound, contaminate, or ``mess up'' our investigations. We discuss how nice it would be to control for the uncontrollable. This leads to the idea, ``If you can't control it, then try to `measure' it''. For Example 1 above we might generate a list of variables that we might like to ``control'' such as:

For Example 2 above:

For Example 3 above:

For Example 4 above:

32 After introducing two predictor attributes (or factors) into our models, it is important to investigate the presence or absence of interaction between the two attributes. Detailed discussions are introduced regarding the various conclusions that can be made based upon the analysis of possible interaction. For Example 3 above, if it is found that there is a strong interaction between Teachers and Pretest Scores, then better student performance might be obtained by assigning a particular teacher to a specific student. And, if it is concluded that there is no interaction, then it may be appropriate to assign any student to any teacher.

33 We indicate to the statistics student that there are some powerful things to be accomplished by combining a prediction model approach with the computer to answer questions of interest.

34 The data analysis and problem solving strategies are based on four major ideas: