Dean Nelson
University of Pittsburgh at Greensburg
Journal of Statistics Education Volume 17, Number 2 (2009), ww2.amstat.org/publications/jse/v17n2/nelson.html
Copyright © 2009 by Dean Nelson all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Regression analysis; Introductory statistics; Ozone; GAISE recommendations.
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can be used to tell a compelling story of success in international diplomacy solving a global environmental problem. A description of the use of these data and analyses are presented for a number of courses in applied statistics including introductory statistics.
The recently adopted Guidelines for Assessment and Instruction in Statistics Education (GAISE) guidelines (ww2.amstat.org/education/gaise/GAISECollege.htm) made six general recommendations for the teaching of introductory statistics. The second recommendation was to "Use real data". We would like to suggest an expanded version, "Use real data that tell a compelling story". This paper is meant to provide an illustration of this amended recommendation. The data and accompanying analysis presented in this paper provide both a meaningful example of data analysis using simple linear regression and a story of remarkable success of international cooperation addressing a global environmental problem.
The Montreal Protocol was an international agreement undertaken in 1987 by 191 countries to reduce the levels of ozone depleting substances (ODSs) in the atmosphere. The agreement has met with unprecedented success. Kofi Annan, former Secretary General of the United Nations referred to the Montreal Protocol as "perhaps the single most successful international agreement to date." The schedule for reductions in manufacturing and use of ODSs was not only met but accelerated. Manufacturers of ODSs, rather than resist the phaseout of ODSs, embraced the challenge and developed replacement compounds reversing possible detrimental economic consequences of the treaty (Shende, 2007).
In all of these qualitative ways, the Montreal Protocol has been uniformly successful, but this paper is not about qualitative measures. Rather, it describes how students are introduced to this story using a set of data that was collected in relative obscurity until discovery of the "hole" in the ozone layer in 1985 (Farman, Gardiner, and Schanklin, 1985). Many of these data are made publicly available, on the website (gaw.kishou.go.jp/wdcgg/) of the World Data Centre for Greenhouse Gases (WDCGG) maintained by the Japanese Meteorological Agency in cooperation with the World Meteorological Organization (see detailed instructions on how to download the data used in this paper in the Appendix). Some of these data tell compelling stories, as do the data on atmospheric concentrations of chlorofluorocarbons (CFCs), measured monthly by the National Oceanic and Atmospheric Administration on Mauna Loa in Hawaii since 1977.
The story of the Montreal Protocol, describing the international cooperation of nearly the entire world, is both interesting and compelling. This global story, however, starts very small, at a molecular level.
Ozone is the name given to the molecule O_{3}. Ozone is formed in the stratosphere when regular oxygen molecules, O_{2}, are split by the ultraviolet radiation of the sun. The oxygen atoms become attached to molecular oxygen to form O_{3}. The amount of ozone in the atmosphere is very small. Even so, the ozone in the atmosphere, along with oxygen molecules, absorbs most of the ultraviolet radiation from the sun. Since ultraviolet radiation is harmful to animal and plant life, the ozone in the stratosphere has a very important protective role for life on earth.
In 1972, Lester Machta reported (Rowland and Molina, 2007) that James Lovelock had discovered a new compound in the earth’s atmosphere, trichlorofluoromethane, a chlorofluorocarbon (CFC). Researchers at the Atomic Energy Commission became interested in these new atmospheric components and asked the question "What happens to these compounds in the environment?" The answer they found (which later became one of the reasons they were awarded the 1995 Nobel Prize in Chemistry) was that these compounds slowly rise into the stratosphere where they encounter ultraviolet light that is energetic enough to break them apart. When they break apart, they release chlorine atoms that can act as catalysts for destruction of ozone (Molina and Rowland, 1974).
Subsequent to the discovery of CFCs in the atmosphere, and an understanding of their potential for ozone destruction, scientists throughout the world began to investigate the problem. Even so, only sporadic measurement and analysis of the ozone layer was conducted until a comprehensive report summarizing the literature was released in 1986, (WMO, 1986). Even though the report was nearly 1,100 pages with 86 pages of reference, it did not claim to provide proof that CFCs were depleting the ozone layer. It was another three years before a joint NASANOAA report (WMO, 1988) was released that provided sound evidence that CFCs were depleting the ozone layer.
The most dramatic evidence of ozone depletion was the seasonal creation of an ozone "hole" over the Antarctic first reported in 1985 (Farman, Gardiner, and Schanklin, 1985). Particularly in the months of September and October, when Antarctic air is isolated from milder air, the extreme cold facilitates the breakdown of the ozone resulting in depletion of up to 80% in that region. Depletion at that level over population regions of the earth could have dramatic detrimental effects on human, animal, and plant life.
While scientists were hard at work trying to determine the extent of the CFC problem, governments and industry were debating what should be done about it. A series of international meetings were organized to address the problem. The end result was the Montreal Protocol of 1987, an agreement signed by 191 countries to begin the process of cooperative action toward a solution.
The Montreal Protocol created a schedule for phasing out manufacturing and use of CFCs by participating countries. Table 1 below shows the initial schedule and subsequent revisions. When the Montreal Protocol was first agreed upon, it was not certain that suitable safe replacements could be found for all the purposes for which CFCs were used. Even so, after the Montreal Protocol, CFC manufacturers were quick to dedicate themselves to developing new compounds that could be used for the same purposes as CFCs without ozone depletion potential. The success of these manufacturers to develop suitable replacements for CFCs allowed the acceleration of the phaseout schedule (also shown in Table 1).
Table 1. Original and Revised PhaseOut Schedules for CFS in Developed Countries

1987 
1990 
1992 
1990 
1994 
1990 
100% 




1991 
100% 
100% 

85% 

1992 
100% 
100% 

80% 

1993 
80% 
80% 

75% 
50% 
1994 
80% 
80% 
25% 
25% 
15% 
1995 
80% 
50% 
25% 
25% 
0% 
1996 
80% 
50% 
0% 
0% 

1997 
80% 
15% 



1998 
80% 
15% 



1999 
50% 
15% 



2000 
50% 
0% 



Every year since full implementation of the protocol, a yearly review has been produced to assess the success of global efforts. In 2007, a panel of scientists from countries around the world produced a 206page report (UNEP 2007) that describes progress since the previous report in 2006. We cannot hope to reproduce the detail necessary to assess the entire scope of the Montreal Protocol, but there are publicly available data that can be used to assess the effect of the Montreal Protocol on one very important aspect of the problem, the presence of CFCs in the atmosphere.
We have used this story in statistics courses of varying difficulty; a first introductory course, a second semester course, and an applied regression course. In each course, the presentation of the problem, the data, and the analysis follows the same pattern, although the expectations of the students differ according to the level of the course. The sequence below was used in these courses for this case study:
What does a value of 145 for CFC concentration mean? This is not an easy question, and one that does not have a definitive answer. Abstractly, it means that 145 parts of every trillion parts are CFC molecules, even though "parts" could mean mass, volume, or molecular count. Students should be able to provide this abstract answer, but they should also be able to tell us why that answer is not actually true. In practice, we can never measure anything without error, and the error in our measurement may have many sources. In order to identify our sources of error, we need to understand how the measurement was taken, what procedures and instruments were used. With a little forethought on our part in assigning independent investigation by students in part 1 above, descriptions of the procedures and instruments can be provided by students in the class.
Although knowing how the measurement was taken allows us to ask the right questions in order to assess the accuracy of the measurement, it will not be possible to answer these questions fully in class. The chemistry, physics, and engineering behind the gas chromatography measurement instruments are too complex. Therefore, we are forced to rely on faith, that the scientists who designed, built, tested, used these instruments did so correctly. That is not too great a leap of faith, because those scientists used the same kind of science that resulted in other technology we rely on every day. Still, our aim here is not to arrive at a definitive answer to the questions, but to determine the right questions to ask, and to assess the degree to which we can rely on our answers, both to gauge the success of our modeling efforts and to determine when we need other expert assistance.
After producing a graph like Figure 1, ask
the students if these data could be useful in determining whether the Montreal
Protocol had been successful in reducing atmospheric CFCs. They will, of
course, say yes right away, in part because they are perceptive enough to
reason that we wouldn’t be going through the trouble of presenting these data
if the data did not show an effect. Because the reasoning for their answer was
superficial, it will require some time before they can articulate why they
think the data can be useful. For instance, one student said, "CFCs were
rising, and after the Montreal Protocol CFCs were falling." Our reply was
"That is what the data show, but the data also show that the level of CFCs
before and after the Montreal Protocol looks about the same, and may even be
higher if we look at right before and right after the Montreal Protocol." Discussions
we have had generally result in a number of proposed definitions of the effect
of the Montreal Protocol. They include
After further discussion, students usually agree that a. falls short of capturing the most important feature of the data,
that is the rate of change.
After understanding the underlying environmental problem and formulating an acceptable definition for "effect", the students are asked to conduct data analyses to determine the extent to which the data confirm an effect on atmospheric CFCs attributable to implementation of the Montreal Protocol. We have used this problem in an introductory course, in the second semester of an introductory course sequence, and in a regression course. With respect to the analysis of these data, a set of expectations for the different levels are listed below.
First Introduction Course
Second Introduction Course
Advanced Course
The timeseries graph of monthly measurements of CFC concentrations from 1977 to 2004 in Figure 1 makes quickly apparent two systematic, yet distinctly different, trends before and after the Montreal Protocol implementation. In order to determine, and justify, the implementation period of the Montreal Protocol, data on CFC manufacturing and use are presented. The international community had agreed at the 1987 meeting to a phaseout schedule (see Table 1) beginning in 1990 to reach 50% by 2000, but this schedule was accelerated by agencies within specific countries or groups of countries so that by 1995, most countries had drastically reduced or eliminated the manufacture and use of CFCs (UNEP, 2007) resulting in dramatic decreases in the manufacture of CFCs worldwide (Figure 2).
The phaseout period of 19901995 corresponds closely to the period of trend reversal in the data depicted in Figure 1. Prior to 1990, the pattern of atmospheric CFC concentration showed a constant rate of increase over time. In contrast, the pattern after 1994 showed a constant rate of decrease in atmospheric concentrations of CFCs over time. The students should recognize that we can estimate and compare these rates of change before and after the Montreal Protocol using estimation from simple linear regression models in and introductory class or using hypothesis testing within a simultaneous regression model in a more advanced class.
Since the data show remarkably constant rates of change during periods that correspond to the times prior to and subsequent to implementation of the Montreal Protocol, these rates of change can be modeled with a simple linear equation. If we take y to be equal to the atmospheric concentration of CFCs in parts per trillion and x to be time in years, then we can use the simple linear regression models in (1) to predict atmospheric concentrations of CFCs over time for before, y_{1}, and after, y_{2}, the Montreal Protocol implementation respectively.
%Equation (1)Using least squares criteria, we obtain estimates for the models in (1).
%Equation (2)In addition, the Montreal Protocol provides a clear rationale for deciding what dates to use to partition our data. We fit this simple linear model separately for two distinct time periods; prior to1990, and subsequent to 1994, corresponding to data before and after the Montreal Protocol implementation. The results of these analyses are given in Tables 2 and 3.
Period 
Source 
Sums of Squares 
df 
Means Square 
F 
p 
Before 1990 
Model 
203118.741 
1 
203118.741 
35093 
.000 

Error 
874.001 
151 
5.788 


After 1994 
Model 
3061.553 
1 
3061.553 
73.863 
.000 

Error 
63.973 
114 
.561 


Period 
Parameter 
Estimate 
Standard Error 
t 
p 
Before 1990 
β_{10} 
19064.219 
102.825 
185.405 
.000 

β_{11} 
9.712 
.052 
187.330 

After 1994 
β_{20} 
3929.678 
49.626 
79.185 
.000 

β_{21} 
1.833 
.025 
73.863 

After estimating the regression models, students are asked a series of questions that prompt reflection on the estimated models:
Alternatively, a simultaneous regression could be used in a more advanced class to estimate parameters for both lines in the same model. For this analysis, the following model is used,
%Equation (3)where x_{1} is the CFC concentration prior to 1990 and equal to zero otherwise and x_{2} is the CFC concentration after 1994 and equal to zero otherwise. Again using least squares criteria, we obtain estimates for the parameters in (3).
%Equation
(4)
\begin{equation}
\hat{\mu}_{y
\vert
x}}}=\hat{\beta}_{10}+\hat{\beta}_{11}x_{1}+\hat{\beta}_{20}+\hat{\beta}_{21}x_{2}
\end{equation}
The results of this analysis are shown in Tables 4 and 5.
Model 
Source 
Sums of Squares 
df 
Means Square 
F 
p 
Simultaneous 
Model 
206180.294 
2 
103090.147 
262.611 
.000 
Two Lines 
Error 
937.974 
265 
392.559 


Period 
Parameter 
Estimate 
Standard Error 
t 
p 
Simultaneous 
β_{10} 
19064.219 
80.409 
237.091 
.000 
Two Lines 
β_{11} 
9.712 
.041 
239.554 
.000 

β_{20} 
3929.678 
124.635 
31.530 
.000 

β_{21} 
1.833 
.062 
29.410 
.000 
Estimating this model with standard software will require that the data be set up to explicitly define the design matrix. The data will consist of four columns, x_{1} and x_{2} as described and two additional indicator columns, each paired with an x column indicating with a value 1 that the xvalue is not zero and a value of 0 if the xvalue is zero. The data below show the setup of the data to explicitly define the design matrix:
Int1
x1 Int2 x2
1.00 1977.00 .00 .00
1.00 1977.08 .00 .00
1.00 1977.17 .00 .00
1.00 1977.25 .00 .00
1.00 1977.33 .00 .00
1.00 1977.42 .00 .00
.
.
.
1.00 1989.58 .00 .00
1.00 1989.67 .00 .00
1.00 1989.75 .00 .00
1.00 1989.83 .00 .00
1.00 1989.92 .00 .00
.00 .00 1.00 1995.00
.00 .00 1.00 1995.08
.00 .00 1.00 1995.17
.00 .00 1.00 1995.25
.00 .00 1.00 1995.33
.
.
.
.00 .00 1.00 2004.33
.00 .00 1.00 2004.42
.00 .00 1.00 2004.50
.00 .00 1.00 2004.58
.00 .00 1.00 2004.67
Using SPSS, we were unable to run the model using the Regression procedure without resorting to executing syntax since it is not possible to alter the collinearity tolerance within the interface dialogue boxes. The following syntax produced the correct parameter estimates for the model above:
REGRESSION
Notice that we explicitly declared that no overall intercept should be included in the model. This prevents the software from adding another column of ones to our design matrix.
Discussion of this model with students is centered on
recognizing that the parameter estimates are identical to the estimates
obtained when the models were fit separately. The model was parameterized
specifically so that this would explicitly occur. The obvious question then
arises as to whether there is any difference between estimating the models
together or separately. The difference is easily explained by comparing the separate
models in (1) to the combined model in (3). For the separate models, two error
terms are used, e_{1} and e_{2}. For the combined model, only a single error
term is used, e. The difference between the models, therefore, is that
the combined model estimates a single error variance rather than the separate
error variances estimated by the separate models. This difference results in
several questions:
After the discussion described in Section 3 item 5, students have developed an understanding of what the definition of an "effect" of the Montreal Protocol means with respect to the data. Depending on the level of the class, this understanding still needs to be translated into statements about population parameters and the corresponding sample statistics used to estimate those parameters. The effect defined in 5b of section 3 is written in terms of the difference between the slopes,
%Equation
(5)
\begin{equation}
d_{S}=
\beta_{11}  \beta_{21}
\end{equation}
Our estimate of this difference uses the slope estimates from our regression analyses,
%Equation (6)The effect defined in 5c is written in terms of the difference between two predicted values,
%Equation (7)Our estimate of this difference uses the predicted values for some value of x from our regression analyses,
%Equation (8)Depending on the level of the class, students are expected to provide point estimates for (6) and (8), and inference in the form of interval estimates and/or hypothesis tests using either the simple linear models estimated in Section 4.1 or the simultaneous regression model estimated in Section 4.2 or both.
5.1 Simple Linear Regression Models
Our estimates of the slopes, and , in the simple linear regression models for before and after the Montreal Protocol implementation are interpreted as rates of change per year in atmospheric CFCs concentration. Our model estimates that atmospheric concentration of CFCs of increased at a rate of 9.712 parts per trillion per year prior to the Montreal Protocol. Contrast that to the rate of change after the Montreal Protocol, a decrease of 1.833 parts per trillion per year. We have defined our point estimate of the effect of the Montreal Protocol as this difference in the rate of change in (6), a decrease of 9.712(1.833)=11.545 parts per trillion in the annual CFC atmospheric concentration rate of change.
We can construct a confidence interval for the rate of change difference as
%Equation (9)where
%Equation (10)and
%Equation (11)The degrees of freedom for the tdistribution are
%Equation
(12)
\begin{equation}
d.f.=\frac{(V_{1}+V_{2})^2}{\left(\frac{V^2_{1}}{n_12}+\frac{V^2_{2}}{n_22}\right)}
\end{equation}
Thus we find our 95% confidence interval for the rate of change difference is 11.545±.113.
If we further ask whether this difference is more than we would expect by chance, a hypothesis test can be conducted comparing the two slopes. Although it is unlikely that the students will suggest the appropriate test, they can readily understand that we may use a ttest for this hypothesis, since the comparison of the two slopes is directly analogous to the comparison of two means. The null and alternative hypotheses are
%Equation (13)The t statistic is calculated as
%Equation (14)with the standard deviation of the difference and degrees of freedom given by (10) and (12) respectively. Calculation of the test statistic gives t=200.861 with a pvalue smaller than .001.
The other way we define the effect of the Montreal Protocol is to determine the difference between what the atmospheric CFC concentration would have been (assuming the same trend were to continue) had the Montreal Protocol not been enacted and what the predicted the atmospheric CFC concentration will be now that the Montreal Protocol has been enacted. Symbolically, the effect is defined as (7) and our estimate of this effect is given in (8)
In order to define the effect, we need to choose a date (a value for x) at which the comparison will be made, say January 2009. The predicted concentration levels of CFCs using the preMontreal Protocol and the postMontreal Protocol regression models are shown in Figure 3 as the straight lines obtained by our regression estimates. For the preMontreal Protocol model, it was assumed that the CFC concentration levels would continue to rise at the same rate, 9.712 parts per trillion a year until 2009. The predicted level is 446.226 parts per trillion in January 2009. For the postMontreal Protocol model, it is assumed the CFC concentration levels would continue to fall at the same rate, 1.833 parts per trillion a year until January 2009. The predicted level is 247.408 parts per trillion in 2009. It seems clear that the Montreal Protocol was responsible for this reversal from a yearly increase in CFC concentration levels to a yearly decrease in CFC concentration levels, since the pattern of CFC concentration change over time coincides exactly with the implementation of the protocol. When we calculate the difference between the predicted CFC levels for the two models, that the Montreal Protocol is responsible for a decrease of 198.818 parts per trillion in CFC concentration in January, 2009.
We can construct a confidence interval for this estimate of the difference between predicted values as
%Equation (15)where
%Equation (16)and
%Equation
(17)
\begin{equation}
V_{i}=s^2_{i}\left[
\frac{1}{n_{i}}+\frac{(x\bar{x}_i)^2}{\sum(x_{ij}\bar{x}_i)^2}\right]
\end{equation}
The degrees of freedom for the tdistribution are computed by (12). Our 95% confidence interval for the difference between predicted values is 198.818±2.734.
A hypothesis test to determine whether the difference between the predicted values is due to chance is constructed similar to (13).
%Equation (18)The test statistic used for this hypothesis is calculated as
%Equation (19)with the standard deviation of the difference and degrees of freedom given by (16) and (12) respectively. Calculation of the test statistic gives t=328.020 with a pvalue smaller than .001. Again, the pvalue is very small, indicating that observing these data when the null hypothesis is true is highly unlikely.
The parameter estimates for the linear terms are identical to those obtained in the separate regressions. Our interpretations of the estimated parameters in the simultaneous model are also identical to those of the simple linear models in 5.1. What differs is that the data are pooled to calculate a single error variance instead of two error variances as was done using separate models, one for each regression model. Consequently, the tvalues testing whether the before and after the Montreal Protocol slopes are equal to zero are different, 239.554 and 29.410 respectively.
Because the estimated model parameters are the same as in the separate regressions, our point estimates of the slope difference and of predicted values difference are also the same. What differs when using the simultaneous regression model are our inferences based on the stochastic part of the model. With a single error variance for the entire model, we construct our confidence intervals and ttests as before except with the assumption of homogeneous rather than heterogeneous variances.
The confidence interval for the difference between the slopes is calculated as before, using (9) with the standard deviation of the difference defined by (10). However, the slope variances, (11), do not have the same values as previously since the error term used in their calculation is estimated from all the data. The degrees of freedom for the critical t value are calculated using
%Equation (20)Therefore, the 95% confidence interval for the slope difference is 11.545±.146.
The value of tstatistic used to test the hypothesis of no slope differences in (13), is computed as in (14). The standard deviation of the slope difference is calculated as in (10). The tvalue is 155.278. Evaluated with 265 degrees of freedom, this tvalue results in a pvalue of 3.158x10^{262}. Again, this pvalue is very small indicating that the probability of observing these data when the null hypothesis is true is highly unlikely.
The confidence interval for the difference between the two predicted values is computed using (15) with the standard deviation of the difference defined in (16) and V_{i} defined as
%Equation (21)This results in a 95% confidence interval for the predicted values difference of 198.818±2.366.
The test statistic used to test the hypothesis is calculated as (19) with the standard deviation of the difference and degrees of freedom given by (16) and (20) respectively. Calculation of the test statistic gives t=371.288 with a pvalue so small neither the algorithms used in either Excel or SPSS are able to calculate its value. Again, we reject the null hypothesis in favor of the alternative that there is a difference between predicted values that is not due to chance.
We introduce some form of evaluation of the models in classes at every level, although the extent of the evaluation varies dramatically. For introductory classes, we confine ourselves to defining, computing, reporting, and interpreting the R^{2} value. We avoid calling it the coefficient of determination, since the words are not intrinsically enlightening. Instead we call it a measure of the goodness of fit, a phrase that is jargon, but also descriptive. We define it as both the ratio of the regression sum of squares divided by the total sum of squares and, in the case of simple linear regression, the square of the correlation coefficient between x and y. The students are taught to interpret the R^{2} value as the proportion of the variance in the data explained by the model.
The goodness of fit, measured by the proportion of variability in the data accounted for by the model, is extremely good for each of the models. The R^{2} values are .996 and .980 for the simple linear regression models fit to data observed before and after the Montreal Protocol respectively. When combining the models, a single error variance is estimated for the simultaneous model. Therefore, the R^{2} value combines the model sums of squares of both individual models in the numerator and the total sums of squares of both individual models in the denominator, and is equal to .995. This is between the R^{2} values of the two separate linear models. It, too, is extremely high.
Additional evaluation of the models is performed in the second introductory and advanced courses. In both courses, we examine the residuals plots shown in Figures 4 and 5.
In both residual plots, there is a noticeable pattern that is not consistent with the assumption of the linear model that the errors are normally distributed about zero with a constant variance. In both plots, there is a curvilinear pattern in the residuals. Therefore, a quadratic term was added to both of the models to create the quadratic model
%Equation (22)The results are displayed in Tables 6 and 7.
Period 
Source 
Sums of Squares 
df 
Means Square 
F 
p 
Before 1990 
Model 
203437.656 
2 
101718.828 
27487.316 
.000 

Linear 
203118.741 
1 
203118.741 



Quadratic 
318.915 
1 
318.915 



Error 
555.086 
150 
3.70057333 


After 1994 
Model 
3119.199 
2 
1559.5995 
5086.140 
.000 

Linear 
3090.876 
1 
3090.876 



Quadratic 
28.323 
1 
28.323 



Error 
34.650 
113 
0.30663717 


Period 
Parameter 
Estimate 
Standard Error 
t 
p 
Before 1990 
β_{10} 
427555.191 
48109.963 
8.887 
.000 

β_{11} 
440.634 
48.511 
9.083 
.000 

β_{12} 
.114 
.012 
9.283 
.000 
After 1994 
β_{20} 
280061.630 
29041.620 
9.643 
.000 

β_{21} 
282.180 
282.180 
29.044 
.000 

β_{22} 
.071 
.071 
.007 
.000 
The R^{2} values were .997 and .989 for the two periods respectively. The parameter estimates for the linear term in both regression equations were highly significant with tvalues of 9.083 and 9.716 respectively. The parameter estimates for the quadratic terms were highly significant as well with tvalues of 9.283 and 9.779 respectively. It is worthy of note that the signs of the quadratic terms indicate whether the change in atmospheric concentrations of CFCs is accelerating up, if the term is positive, or accelerating down, if the term is negative.
Review of the residual plots from the quadratic models shows that the expected pattern of error randomly distributed about zero with a constant variance is now plausible, and a deviation from that pattern is less noticeable. For a class that includes the topic of autocorrelation, connecting the errors from time to time will reveal that there is still a pattern of autocorrelation in the data that could be investigated by further analysis. Although those analyses are not considered within the scope of this paper, these data could be a useful example in a course that covers autocorrelation.
Although the quadratic model is statistically justified in both models, i.e. the quadratic terms account for a statistically significant proportion of the variability in the data, the proportions of variance explained are very small compared to the linear terms. For the purposes of analyzing these data in introductory classes, the quadratic terms have not been used when investigating the primary research question, the impact of the Montreal Protocol. The justification for this is threefold. First, introduction of the quadratic terms makes interpretation of the model more difficult if the purpose of the class is to introduce specific concepts, like rates of change, rather than to be statistically exhaustive. Second, even if we felt compelled to select the best model statistically, the quadratic model would be prone to larger error when extrapolating outside the range of the data, hence requiring a greater degree of caution. In this case, using a linear model without the quadratic term will underestimate the difference between predicted values at some point in the future outside the range of the data. Even so, the differences are highly significant. Third, once a firm grasp of the linear model has been achieved, then further refinement of the model by adding the quadratic term can be introduced. Emphasis in this paper is placed on an understanding of the linear model.
This case study tells a compelling story of international cooperation resulting in the successful collaboration on a global environmental problem. Further, evidence of this success can be encapsulated in a simple set of data that is accessible to students in an introductory course, yet complex enough to allow for use in more advanced courses. This case study also provides a rich context within which to introduce and explore many of the concepts central to the statistical analysis of data, the assumptions entailed by methods of measurement, sampling, and analysis. Lastly, there is a moral to the story. Even though a complete evaluation of success of the Montreal Protocol with respect to ozone layer depletion is beyond the scope of this paper, it is clear that we can claim the Montreal Protocol had a real, positive effect on the levels of atmospheric CFC concentration. So we return to the beginning of the story and ask the students to ponder whether similar international efforts could have similar effects on other global problems.
The data used in this paper can be found at the World Data Centre for Greenhouse Gases (WDCGG) web site maintained by the Japanese Meteorological Agency in cooperation with the World Meteorological Organization. Navigation steps to this file are:
The author would like to acknowledge the work of the National Oceanic and Atmospheric Administration for collection of the data used in this paper.
Farman, J. C., Gardiner, B. G. and Shanklin, J. D. (1985), "Large Losses of Total Ozone in Antarctica Reveal Seasonal CLOx/NOx Interaction", Nature, 315, 20710.
Molina, M. J. and Rowland, F. S. (1974), "Stratospheric Sink for Chlorofluoromethanes: Chlorine Atom Catalyzed Destruction of Ozone", Nature, 249, 81014.
Ott, L. (2001) An Introduction to Statistical Methods and Data Analysis. Pacific Grove: Duxbury.
Rees, D. G. and Henry, J. K. (1988), "On comparing the predicted values from two simple linear regression lines", The Statistician, 37, 299306.
Rowland, F. S. and Molina, M. J. (2007), "The CFCozone puzzle: environmental science in the global arena". In Kaniaru, D. (ed.) The Montreal Protocol Celebrating 20 Years of Environmental Progress Ozone Layer and Climate Protection. London: Cameron May.
Satterthwaite, F. E. (1946), "An approximate distribution of estimates of variance components", Biometrics Bulletin, 2, No. 6, 110114
Shende, R. (2007), "From Montreal to Kyoto: The Refrigeration Industry’s Journey Toward Sustainability". In Kaniaru, D. (ed.) The Montreal Protocol Celebrating 20 Years of Environmental Progress Ozone Layer and Climate Protection. London: Cameron May.
UNEP (2007), Report of the UNEP Technology and Economic Assessment Panel, United Nations Environmental Programme, Montreal Protocol On Substances that Deplete the Ozone Layer. http://www.unep.ch/ozone/Assessment_Panels/TEAP/Reports/TEAP_Reports/Teap_progress_report_April2007.pdf
Welch, B. L. (1938) "The significance of the difference between two means when the population variances are unequal", Biometrika 29, 350362
World Meteorological Organization (1988), Report of the International Ozone Trends Panel—1988 (Report 18, Global Ozone Research and Monitoring Project, World Meteorological Organization, Geneva).
World Meteorological Organization (1986), Atmospheric Ozone 1985 (Report 16, Global Ozone Research and Monitoring Project, World Meteorological Organization, Geneva).
Dean Nelson
University of Pittsburgh at Greensburg
150 Finoli Drive
Greensburg, PA 15601
Email: den@pitt.edu
Phone:7248388044
Fax: 7248367172
Volume 17 (2009)  Archive  Index  Data Archive  Resources  Editorial Board  Guidelines for Authors  Guidelines for Data Contributors  Home Page  Contact JSE  ASA Publications