Thomas J. Pfaff
Aaron Weinberg
Ithaca College
Journal of Statistics Education Volume 17, Number 3 (2009), ww2.amstat.org/publications/jse/v17n3/pfaff.html
Copyright © 2009 by Thomas J. Pfaff and Aaron Weinberg all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.
Key Words: HandsOn Demonstration; Active Learning; Central Limit Theorem; Confidence Interval; Hypothesis Testing.
This article describes the design, implementation, and assessment of four handson activities in an introductory college statistics course. In the activities, students investigated the ideas of the central limit theorem, confidence intervals, and hypothesis testing. Five assessments were administered to the students, one at the beginning and end of the course, and three in between the activities. We found that, despite our attempts to engage our students in active reflection, their performance on the assessments generally did not improve. These results raise important issues about the design of pedagogical tools and activities as well as the need to gather data to assess their effectiveness.
As statistics has become a focal point of both K12 curricula (National Council of Teachers of Mathematics, 2000, 2006) and a required course for many undergraduate majors, there has been an increasing emphasis on helping students develop statistical reasoning. At our institution, which is a predominantly undergraduate comprehensive college, we teach introductory statistics courses to over 200 students each year.
As an essential component of statistical literacy, we want our students to move beyond simply computing confidence intervals and pvalues to understanding what these concepts really mean and where they come from. Our goal was to design inclass, handson activities (which we called "modules") that would help our students develop an understanding of important statistical ideas. We decided to focus on determining the effectiveness of our activities in helping students increase their understanding of statistical concepts.
Our primary focus was our students in an entrylevel Business Statistics course, of which our institution runs 69 sections each school year. The goal of our Business Statistics course is to teach students how to ask statistical questions, design an experiment, collect data, decide on an appropriate statistical test, and interpret the results. We were not concerned with deep theoretical explanations of statistical theorems. Instead, we decided to focus on important concepts that would help students interpret their results—such as a pvalue or a confidence interval—in a meaningful way. We wanted our students to develop an understanding of:
We believe that interactive data collection activities may increase student understanding of statistical concepts. Our belief is based on the idea that students do not learn passively, but instead learn by making new connections with their previous understanding to develop new knowledge structures (Von Glaserfeld 1987). In the learning process, the student’s mind responds to cognitive conflict by actively arranging and rearranging mental structures. Our goal as educators is to create and structure this cognitive conflict in order to facilitate our students’ active prediction and reflection that will generate new knowledge and understanding.
Our theory is borne out by other researchers. Snee (1993) makes an argument for experiential learning, and Gnanadesikan, Scheaffer, Watkins and Witmer (1997) claim: "Activitybased courses and use of small groups appear to help students overcome some misconceptions of probability and enhance student learning of statistics concepts." Some researchers have suggested the importance of experiencing data collection (e.g. Hunter 1977; Hogg 1991; Mackisack 1994) and many have recommended laboratorybased courses, inclass activities, and class projects (e.g. Hunter 1977; Dietz 1993; Fillebrown 1994; Mackisack 1994; Ledolter 1995; Bradstreet 1996; Chance 1997). Mills (2002) cites examples of students actively involved in data collection and analysis who developed a better understanding of various statistics concepts (e.g. Goodman 1986; Hubbard 1992; Mittag 1992; Gratz, Volpe, and Kind 1993; Packard, Holmes, and Fortune 1993; Sullivan 1993; Giesbrecht 1996; Marasinghe, Meeker, Cook, and Shin 1996; McBride 1996; Velleman and Moore 1996).
Gnanadesikan et al. (1997) describe this process of cognitive conflict and reflection taking place in a statistics classroom:
When students are tested and provided feedback on their misconceptions, followed by corrective activities (where students are encouraged to explain solutions, guess answers before computing them, and look back at their answers to determine if they make sense), this "correctivefeedback" strategy appears to help students overcome their misconceptions.
Lunsford, Rowell, and GoodsonEspy (2006) argue that lectures and demonstrations were not as effective as handson activities for developing understanding:
Just demonstrating graphical concepts in class via computer simulation was not sufficient for our students to develop these skills. We believed our students needed to have a directed (either through an activity or an exercise) handson experience (either inclass or outofclass) with simulations that emphasize graphical representations of distributions (such as Sampling SIM or many of the simulations in the VLPS).
Several researchers (e.g. Hodgson (1996), Schwartz, Goldman, Vye and Barron (1997), and delMas, Garfield and Chance (1999)) found that introducing computer simulation activities into their classes increased their students’ understanding but that this increase—while statistically significant—was not dramatic. Consequently, we sought recommendations for designing and implementing successful activities. Based on their own experience, delMas, Garfield and Chance (1999) recommended that activities:
 Provide guidance to "facilitate exploration and discovery."
 "Use simulations to draw students’ attention to aspects of a situation or problem that can easily be dismissed or not observed under normal conditions."
 "Provide a supportive environment that is rich in resources, aids exploration, creates an atmosphere in which ideas can be expressed freely, and provides encouragement when students make an effort to understand."
 Provide representations for interrelated concepts and build connections "between different representations of the same phenomena."
 "Help students evaluate the difference between their own beliefs about chance events and actual empirical results."
To incorporate these activities into our modules, we made the following design guidelines:
 The activity should be motivated by a "natural" question and students should explore the situation by gathering and interpreting data.
 The teacher should ask scaffolding questions, solicit conjectures, and ask students to explain their reasoning.
 Multiple representations (numerical descriptions, tables, and graphs) and modes (kinesthetic, verbal, and written) should be integrated into the activity and the teacher should facilitate making connections between the representations.
 The context should build on students’ intuitions and experiences by using situations with which students are familiar and asking questions that students see as nonabstract. This should enable students to make predictions before engaging in the activity.
 Students should be asked to reflect on their predictions after they have collected data and again during a guided class discussion to articulate the difference between their predictions and the results.
A common strategy for implementing handson activities is to use a computer simulation method (CSM). Wood (2005) notes that computer simulations "tend to be more general [than conventional approaches], require far less technical background knowledge, and, because the methods are essentially sequences of physical actions, it is likely to be easier to understand their interpretation and limitation." Mills (2002) notes that while many researchers have recommended the use of CSMs, "a review of the literature reveals very little empirical research to support the recommendations."
When having our students perform simulations on their calculators or computers, we noticed that they seemed to not believe that the simulation reflected what would "actually" happen. For example, we had students simulate putting a group of people in an elevator by randomly selecting sample weights from a distribution of U.S. adult weights. After simulating several elevators, students still claimed that "in reality" the filled elevators would be much heavier than their simulation predicted.
Instead of using computerbased simulations, we decided to incorporate physical objects into our activities. We hypothesized that by using concrete objects, the activity would provide more opportunities to create and structure cognitive conflict and to facilitate our students’ active prediction and reflection. In an appropriate concrete activity, students cannot be passive and simply observe the data collection and interpretation. Instead, they are personally involved with the underlying population, making their sampling more realistic. Furthermore, the results of a concrete activity wouldn’t merely reflect "reality," but would be unarguably real.
For example, a computer simulation approach to investigate the meaning of a 90% confidence interval might have a student rapidly simulate the generation of 100 (or more) confidence intervals from some fixed population and observe that about 90 of them trap the true parameter. While this does not take much class time, students do not interact with the underlying population, do not experience the sequence of samples that gradually suggests the value of the underlying parameter, and may not readily experience the confusion that arises when some of the resulting intervals fail to trap this parameter.
In a concrete version of the same activity, we might give each student a population (e.g. a bag of blue and purple bingo chips) and have each student try to estimate the proportion of chips that are blue. In drawing a sample from the bag to create a confidence interval, students develop an intuitive understanding of the underlying population and, when comparing confidence intervals with their classmates, extend this intuition to the relationship between the sample and the population. Through this process, students are constantly predicting the results and reflecting on these predictions in the class discussion.
With this reasoning, we added two additional guidelines for designing our activities:
This reasoning suggests expanding the study to compare the effects of our concrete approach with a CSM. For this pilot project, however, we decided to focus on evaluating whether or not our activities were effective in developing students’ understanding, leaving the comparison with CSMs for a future study.
We designed four modules to engage the students in actively making sense of the big ideas of the course. We used each module at the time in the semester when we would normally be discussing the corresponding topic.
Over the course of the semester, we also administered five written assessments to the entire class. The goal of these assessments was to evaluate our students’ understanding of the "big ideas" before using the modules, soon after using the corresponding module, and again near the end of the semester. We primarily drew our questions from the "Tools for Teaching and Assessing Statistical Inference" web site (http://www.tc.umn.edu/~delma001/stat_tools/). Many of the questions were repeated on multiple assessments so we could determine if our students’ performance changed over the semester. The questions used for assessment were not used on any other exam or homework problem in the course. The assessments were not administered by the course instructor, and the instructor did not see them until after final grades were submitted.
Detailed descriptions of the modules we used in class along with the associated worksheets are included in Appendix A.
The goal of this module is to introduce the central limit theorem and observe its effect on distributions. The activity suggested here is similar to one proposed by Gnanadesikan et al. (1997) who used the dates of a random collection of pennies for their initial sample. Our module offers an important improvement on the penny activity by allowing students to know the probability distribution for the population prior to sampling. While the distribution of pennies is often skewed and unimodal, our module begins with a distribution that is bimodal, ensuring that it will not look normal; it can also be easily modified to have an initial distribution that is skewed, although this will increase the sample size needed for the sampling distribution to look normal.
In this module, each student gets a suit of 13 cards. Each card is assigned a number equal to its face value with a Jack equal to 10, and the Queen and King each equal to 0. This creates a population with a Ushaped distribution. Students are asked to predict the shape of the sampling distribution (for their individual suit of cards) if they draw 30 cards with replacement, then compute µ and s. Each student then draws 30 cards with replacement and records the sample mean of the first 1, 10, 20, and 30 cards. As a class, the students record their results in a spreadsheet and predict what each distribution should look like, describing their shape, spread, and center. After generating a histogram of their results, the class computes the means and standard deviations for each distribution, describes what they see and discusses the results.
In this module, students construct confidence intervals and find pvalues using a tdistribution. Although Dambolena (1986) and Gordon and Gordon (1989) encouraged readers to use computer simulations and graphics to enhance students’ understanding of the tdistribution, it is not readily apparent that their methods offer a more effective instructional method than a handson approach.
Students are given three dice (a sixsided die, an eightsided die, and a twelvesided die) and investigate whether the mean of the sum of the dice is identical to the sum of their means by taking a simple random sample of n = 30 rolls of the three dice. Of course, students could calculate both of these means, but they recognize that calculating the mean of the sum involves substantial effort and so the statistical approach is helpful.
Before beginning the experiment, each student describes the population, the parameter of interest, the statistic that will be computed, states and writes a sentence describing the meaning of the null and alternative hypotheses, and predicts whether or not they think the null hypothesis will be false. Doing this should help students understand what hypothesis they are testing before they begin and forces them to use formal notation and language to describe the situation.
Each student then rolls the dice 30 times, computes and s, and computes a 90% confidence interval and pvalue. They enter their proportion, pvalue, and confidence interval into a class spreadsheet. The class then examines a table of results, a graph of the distribution of samples and a graph of the confidence intervals and collectively decides whether the mean of the sums is equal to the sum of the means. After revealing to the class that these quantities should be equal, the students investigate the connection between these quantities and their confidence intervals along with the pvalues and the cases in which they rejected H_{0}.
The goal of this module is to explore the ideas of a hypothesis test and a confidence interval by having students try to determine if a bag contains equal proportions of two different colors of bingo chips. As with the previous module, students describe the population, parameter, statistic, H_{0}, and make a prediction about H_{0} before collecting data. Each student then gets a small canvas bag containing 70 blue bingo chips and 30 purple chips; while the activity could be done with 10 chips in each bag, we believe it is helpful for the population to be larger so that a student can’t easily describe it by glancing into the bag.
Students sample 45 chips with replacement and use this to compute a pvalue and an 80% confidence interval (chosen so that we have a good chance that a few students would not trap the parameter). Each student enters their proportion, pvalue, and confidence interval into a class spreadsheet. The class then examines a table of results, a graph of the distribution of samples and a graph of the confidence intervals and collectively decides whether their bags had equal proportions of the different chip colors; if they decide that the bags were not equally split, then they try to estimate what the actual split was. After revealing to the class that there was a 70/30 split, the students investigate the connection between this split and the confidence intervals along with the pvalues and the cases in which they rejected H_{0}.
When comparing their confidence intervals, students should quickly notice that their centers vary widely (but the widths only vary a little); consequently, their pvalues will also vary and not all of the intervals will capture the true proportion. The 70/30 split gives a power of 78.5%, which means roughly one fifth of the students will fail to reject the (false) null hypothesis.
This module extends the oneproportion test by using two populations instead of one. A similar activity has been implemented with a CSM by Wood (2005) using the "two bucket story" to derive bootstrap confidence intervals and simulate probability distributions.
While students used a single bag of chips in the previous module, here they use two bags of chips to determine if the bags have identical proportions of blue chips. As before, each student describes the populations, parameters, statistics, H_{0} and Ha, and makes a prediction about whether they will reject H_{0} prior to starting the experiment.
Each student (or pair of students) then gets two small canvas bags. The first contains 70 blue chips and 30 purple chips; the second contains 60 blue and 40 purple chips. Students sample 45 chips with replacement from each bag and use this to compute a pvalue and an 80% confidence interval. Each student then enters their results into a class spreadsheet.
After revealing the actual proportions to the class, students investigate the connection between this split and the confidence intervals and discuss the pvalues and the cases in which they rejected H_{0}. As in the previous module, students should notice that the confidence intervals and pvalues vary even though they all drew random samples from identical populations.
While all of the items in our assessments were either multiple choice or true/false, every question was followed by a prompt for students to explain their reasoning. All assessments were administered during regular class periods by a colleague who was not teaching the course. The assessments can be found in Appendix B.
 Assessment 1 was given to students near the beginning of the semester. This assessment included questions about the relationship between statistics and parameters, the central limit theorem, the meaning of confidence intervals, hypothesis tests and pvalues. While we didn’t expect students to understand some of these technical concepts, we included them here so that we could compare their performance on this preliminary exam with later assessments.
 Assessment 2 was administered after the first module. It included questions designed to measure students’ understanding of the central limit theorem.
 Assessment 3 was administered after the second module. It included questions designed to measure students’ understanding of confidence intervals and hypothesis tests (specifically, the meaning of the null hypothesis) using population means as the parameter of interest.
 Assessment 4 was administered after the third module. It was nearly identical to Assessment 3 except for a reordering of some multiplechoice answers and a focus on population proportion instead of the mean as the parameter of interest.
 Assessment 5 was administered near the end of the semester. It was designed to measure the students’ "retention" of the concepts they had worked with in the modules and included a subset of the questions that had appeared on the previous assessments. We decided to not include the entire set of questions so that our students could finish the assessment in one class period (50 minutes).
After the end of the semester, we recorded each student’s multiple choice and true/false answers as well as their written explanations in a spreadsheet. For each item, we identified the correct answer, assigning it a value of 1, and in a few instances we also gave partial credit to an answer that, while technically incorrect, still reflected an understanding of the "big idea" behind the question.^{1}
For each item that appeared on multiple assessments, we conducted a McNemar test using SPSS to compare students’ performance between each pair of assessments. For this analysis, partial credit was converted to full credit (a value of 1) due to the requirements of the test. Since we expected that our modules would improve our students’ understanding of the concepts—and that their understanding would translate to increased performance on the assessments—we used a onesided test.
For each collection of problems that appeared on multiple assessments, we computed an "exam score" for each student on each assessment by finding the sum of their scores for those problems. Here, we used the partial credit as noted above. We then used a paired ttest in SPSS to compare students’ scores on each pair of assessments.
Several assessments included multiple items that addressed the meaning of confidence intervals. Students’ responses for these questions were crosstabulated for each pair of questions in each assessment and analyzed for significant associations using a χ^{2} test in SPSS.
In addition, we used the methods developed by delMas, Garfield, and Chance (1999) to analyze students’ reasoning on the "sampling distributions" questions, which asked students to describe how the shape of a distribution of sampling means changes as you increase the sample size.^{2} Students’ answers were characterized as "correct reasoning," "good reasoning," "larger to smaller reasoning," and "incorrect reasoning."^{3}
Although we included a prompt for students to explain their reasoning on each question, most students did not provide explanations. Furthermore, many of their explanations were little more than a restatement of their answer and did not enable us to draw much insight into their reasoning. Because of this, we decided to not use students’ explanations in our analysis.
Overall, students’ understanding of the statistical concepts did not seem to improve. While students showed some significant improvement on individual items that appeared on multiple assessments, their performance actually significantly decreased on others and showed no change for most items.
We will begin by describing the results for the "exam scores" and describe results for individual items below. For each collection of questions that appeared on multiple assessments, we found the students’ average percentage score on the pair of assessments; the graph below (Figure 1) shows students’ performance and results of the paired T test:
Although some of the increases were statistically significant, it’s not clear that the increases were practically significant. In addition, with one exception, students rarely scored more than 60%. Although not significant, students’ performance actually decreased on a set of items that appeared on assessments 3, 4, and 5.
Since these sets of questions are not all the same, we can’t draw robust conclusions from these data. However, these data suggest that students seemed to slightly improve their understanding of the concepts, although students may have "lost" some of that understanding in the five weeks between using the modules and taking Assessment 5.
To investigate students’ understanding of the central limit theorem, we presented a distribution for a population and five potential sampling distributions (see Figure 2).
Students were asked to identify which histograms could correspond to sampling distributions for samples of increasing sizes and identify how these increasing sizes would affect the shape and spread of the distribution. This item appeared on Assessments 1, 2, and 5 with the only difference being the shape of the distributions. It should be noted that the differences between the histograms presented to the students may have been too subtle for the level of the course.
On assessments 2 and 5, roughly half of the students correctly responded that the sampling distributions should be shaped more like a normal distribution and would have less variability when the sample size increased. Apart from this, students were generally unsuccessful at identifying how increasing the sample sizes would affect the distribution of sample means.
Not only did few students give correct answers, but there was very little improvement in performance between the three assessments. When asked to describe how increasing the sample size would change the distribution, students generally gave more correct answers on each successive assessment. However, these increases were not significant. When students were asked to identify the sampling distribution for samples of size 4 and 16, their performance actually decreased between the assessments; this decrease was significant in some cases.
In addition, students reasoning about the effect of increasing the sample size was generally poor. Table 1 shows students’ reasoning (using the categories described by delMas, Garfield, and Chance (1999))
These data suggest that students not only failed to develop an understanding of the central limit theorem, but for some questions their original intuition was apparently more accurate than their conception after the module and at the end of the course. In some sense, it appears that the students became more confused which may be a sign that they were thinking about the central limit theorem but had confounded multiple ideas about the concept.
This result—particularly for the assessment immediately following the module—was surprising. In the module, students individually gathered data and computed sample means for successively larger samples. The class predicted what the sampling distributions should look like, collectively plotted their data and discussed the results. The resulting distributions were reasonable illustrations of the central limit theorem, and we expected that the active prediction, reflection, and class discussion would help students develop an intuitive sense of how the sample size affected the distribution.
Students in our class were generally able to successfully use a calculator to compute a confidence interval and then use this interval to decide whether or not to reject a null hypothesis. However, students had difficulty understanding the meaning of confidence intervals and how they are constructed. This is significant, for without this understanding it is not clear that students understand the accuracy and potential errors inherent in a confidence interval.
Students were somewhat successful at identifying that the sample statistic is guaranteed to lie in the confidence interval. Roughly half of the students identified this, with the percent increasing notably (p = .072) between Assessments 1 and 5. Since students relied primarily on their calculators instead of computing confidence intervals manually, perhaps it isn’t surprising that they did not recognize the relationship between the sample statistic and the confidence interval. Despite this, it is troubling that some students thought that the sample size or standard deviation must lie within the interval, since both of these quantities have little to do with where a confidence interval is centered.
Table 2 is typical of our results. On the first, third, fourth, and fifth assessment (denoted A1, A3, A4, and A5), students were asked to select the meaning of a 95% confidence interval from the following four choices:The first choice is the correct answer, although the second choice is not unreasonable. All four choices are similar, and this question requires a solid understanding of the meaning of a confidence interval (or memorizing what it means) to get it correct. Students’ answers were classiffied as right or wrong (only the first choice was coded as right) and for each pair of assessments. Table 3 shows the percent of students who fall into each category of correctness.
As can be seen in Table 2 and 3 there were very few patterns in students’ responses. Comparing A1, A3, and A4 to A5, they were slightly more likely to switch from a right to a wrong answer than they were a wrong to a right answer, although this was not significant. However, students were slightly more likely to switch from a wrong to a right answer than they were a right to a wrong answer when comparing A1 to A3 and A4 (which, again, was not significant).
Students had varying success answering true/false questions about the meaning of a (95%) confidence interval, such as "If 200 confidence intervals were generated using the same process, about 10 of the confidence intervals would not include the population mean (µ)." While roughly 5080% of students responded correctly on some of these questions, there were no distinct patterns of increase (or decrease) between the assessments. In addition, students generally had difficulty identifying what a 95% confidence interval means from a list of four options, and their performance decreased by the final assessment.
The assessments included multiple questions about confidence intervals, and it would seem reasonable that students’ responses would demonstrate some consistency in their reasoning. For example, if a student thought that a 95% confidence interval meant that "95% of the intervals constructed using this process based on samples from this population will include the population mean," we would expect that student to say that the statement: "If 200 confidence intervals were generated using the same process, about 10 of the confidence intervals would not include the population mean" was true. However, there was no significant association between these two responses (p=.756 on Assessment 3, p=.449 on Assessment 4, and p=.477 on Assessment 5). In fact, there were no significant associations between students’ responses on pairs of questions like these. This suggests that the students had not developed a coherent conception of the meaning of confidence intervals. These results were troubling because a significant amount of time during the modules was spent having individual students compute confidence intervals and then comparing all of the intervals generated by the class. As expected, the centers of these intervals varied greatly and roughly 5% of them did not contain the true population parameter. We expected this handson method of data collection and statistic generation to give students a basic understanding of the meaning of a confidence interval.
Much like confidence intervals, students can frequently make a statement about a null hypothesis—whether or not it is false—but may not have a good understanding of what it is they are rejecting. On Assessments 3, 4, and 5, we asked students to indicate if it is possible to "prove the null hypothesis." On each assessment, roughly 80% of the students believed that, under certain conditions, it is possible to do so. However, based on informal conversations with students after all assessments were done, it appeared that some students had been thinking that they could prove the null hypotheses if they sampled the entire population.
Although most students in the class knew to reject a null hypothesis if the hypothesized value was not contained in the confidence interval, they had difficulty identifying a confidence interval that would allow them to reject a particular null hypothesis; there was no clear pattern to their errors.
Similar to the idea that different samples from a population can produce different confidence intervals, two samples from the same population in a welldesigned study can produce different statistics. Students were asked to predict what pvalues two researchers would get if they collected data in two random samples from the same population. Overall, students were relatively successful and generally improved from Assessment 1 to Assessment 4 (from roughly 65% getting the questions correct to 82% getting them correct). Since one of the choices for a response was "I’m not sure," it seems that roughly half of the students began with and maintained a reasonable intuition about this concept.
While some of the percentages are relatively high, we had hoped that our students would perform significantly better. As part of the modules, individual members of the class drew random samples from identical populations and computed their own pvalues. These pvalues were different for each person and some of the values were significant while others were not. Since students encountered and discussed this idea in multiple modules, we thought that their performance on these questions would be better than it was.
Our handson activities generally failed to help students develop a good understanding of the underlying statistical concepts. We had hoped that students would not only get a large percentage of the questions correct by the end of the course, but they would have also show significant improvement from their scores at the beginning of the course. Even though we thought we had designed and implemented these modules in a way that would help our students understand the "big ideas," our assessment showed that we did not accomplish our goal.
On a survey we gave at the end of the semester, students had a positive reaction to the modules. Many students (42%) listed the modules as the most interesting thing done in the course. When asked to indicate how beneficial the modules were on a scale of 1 (not beneficial) to 5 (extremely beneficial), the median score was a 4. Even though our modules did not effectively develop understanding, they did engage the students in the course.
These results have significant ramifications for teachers of statistics. No matter how innovative or stimulating a pedagogical idea may seem—and no matter how much the students seem to enjoy the class—it may not be suffcient to develop students’ understanding. Numerous articles are published every year describing activities, pedagogical tools, and techniques that the authors believe will increase student understanding and engagement. However, it is imperative that these pedagogical innovations are tested and that we have empirical evidence of their effectiveness.
There are, of course, lurking variables in our study. It could be that our modules were .ne as written, but we did not implement them successfully or devote enough class time to their implementation. Conversely, our implementation may have been .ne but an aspect of the modules’ design could have been confusing. Since the ideas we addressed in the modules were also discussed at other times during the semester, we can’t distinguish between the effects of the modules and the effects of the other instruction; it could be the case that students did learn from doing the modules but then formed competing conceptions from other activities. Conversely, the modules were only implemented in one class period and students may have compartmentalized the class’ discussion of the modules as distinct from the rest of the course. It also could be that our students were in some way unprepared to successfully reflect on their activities.
In addition, there are several methodological issues with our study. As previously mentioned, we could not separate the modules from the rest of the course, making it difficult to determine how the modules specifically affected student learning. Since we relied on written assessments and students wrote few explanations, we feel that we did not get a clear picture of our students’ reasoning. It could be the case that many students could provide justifications for their answers that reflected a degree of understanding, or that students who provided correct answers were simply making educated guesses without really understanding the concepts.
Although our sample size was relatively small, it is unlikely that increasing the sample size would have shown significant overall improvement in student performance. Of the 76 comparisons for which we performed a McNemar test, students’ performance decreased in 19 and did not change in 18. In addition, fewer than half of the tests that suggested an improvement in performance had pvalues below .2. Consequently, a posthoc analysis of power would not be particularly illuminating.
Our study design did not use a control group of students who either took an identical class (without using the modules) or—at least—a very similar class. Such a group was not necessary to reach our conclusions— that the activities failed to substantially increase our students’ understanding. However, when we conduct followup studies it will be important to have such a group to determine if the activities themselves are effective and if they are more effective than traditional instruction or CSMs.
An additional open question is the role that additional reflection and reinforcement might play in students learn from the modules. We did not assign any out of class work directly related to the modules. The questions in our assessments did not explicitly resemble the activities in the modules, and students were never directly tested on the content in the modules. As a result, we assume that the students spent minimal time outside of class reviewing and reflecting on the activities. Would this additional reflection help students develop and retain an understanding of the ideas addressed by the modules, and—if so—how much is needed?
Given these open questions, we can’t conclude that the modules themselves were inadequate. However, this study has led us to rethink the design of the modules and helped us identify ways they might be improved. For example, when we use the modules in the future, we plan on giving students followup activities that have them spend more time describing key aspects of the concepts (such as the relationship between various confidence intervals and the associated parameter). Further, we plan to change and expand the way we assess our students’ learning. For example, the way we actually used confidence intervals in class was to check whether or not the value of the null hypothesis was within the interval and—if it wasn’t—to reject the null hypothesis; we plan on augmenting these tasks with writing assignments in which students explain their reasoning and the underlying statistical concepts. It is through this process of goalsetting, evaluation, assessment, and incremental improvement that we hope to not only help our students develop an understanding of statistics, but turn a reflective, critical eye on our own teaching to help us improve as educators.Each student gets a suit of (13) cards. We assign to each card the following value:
Here is our system for assigning values to each card:
You will be drawing 30 cards (with replacement) and computing several means and standard deviations.
Before you begin, predict:
In this activity, students will try to determine if a bag contains equal numbers of two different colors of bingo chips. Before beginning the experiment, each student must successfully complete the first 5 questions on the worksheet. This ensures that the students understand what hypothesis they are testing before they begin and forces them to use formal notation and language to describe the situation.
Directions: Each student will be given a bag that contains two different colors of bingo chips (blue and purple). We will take a simple random sample of n = 45 chips (with replacement) to try and determine the if there is an equal amount of blue and purple chips.
While students used a single bag of chips in the previous activity, here they will use two bags of chips to determine if the bags have identical proportions. As before, each student must successfully complete the first 5 questions on the worksheet prior to starting the experiment.
Directions: Each student (or pair of students) will be given two bags (we will call bag two the bag with an X on the outside) that contains two different colors of bingo chips. We will take a simple random sample of n = 45 chips (with replacement) from each bag to try and determine if the ratio of blue chips is the same in each bag.
In this activity, students will try to determine if the sum of the means of three dice is the mean of the sum of the dice. Before beginning the experiment, each student must successfully complete the first 5 questions on the worksheet. This ensures that the students understand what hypothesis they are testing before they begin and forces them to use formal notation and language to describe the situation.
Directions: Each student gets three dice: A sixsided, eightsided, and 12sided die. We want to decide if the mean of the sum of the three dice is the same as the mean we obtain by summing the mean for each of the die. We will take a simple random sample of n = 30 rolls of the three dice.
Name__________________________________________ Date _______________________________
1. Which of the following distributions shows MORE variability?
A has more variability__________ B has more variability__________
Circle the statement (or statements) that led you to select your answer above.
(a) Because it’s bumpier
(b) Because it’s more spread out
(c) Because it has a larger number of different scores
(d) Because the values differ more from the center
(e) Other (please explain)
2. Figure A represents a sample of 26 weights and Figure B represents a sampling distribution of mean weights for samples of size 3. One value is circled in each distribution.
3. A sample of 50 data measurements is selected from a population of temperatures. A sample mean of 20 degrees is obtained. What would be your best estimate of µ, the population mean?
(a) It would be exactly 20 degrees
(b) It would be close to 20 degrees
(c) I wouldnt be able to make an estimate. I know nothing about µ. Its an unknown parameter and this is just one sample.
(d) Other:
4. For the quantities listed below, circle the ones that vary from sample to sample and explain why you chose these:
• Population standard deviation
• Sample standard deviation
• Population mean
• Sample mean
5. The distribution for a population of test scores is displayed below on the left. Each of the other five graphs labeled A to E represent possible distributions of sample means for random samples drawn from the population.
Please read each question carefully.
(a) Which graph represents a distribution of sample means for 500 samples of size 4?
(circleone) A B C D EAnswer each of the following questions regarding the sampling distribution you chose for the above question:
(b) What do you expect for the shape of the sampling distribution? (check only one)
___ D Shaped more like a NORMAL DISTRIBUTION
___ D Shaped more like the POPULATION
___ D Shaped like some OTHER DISTRIBUTION(c) Circle the word between the two vertical lines that comes closest to completing the following sentence.
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(d) Which graph do you think represents a distribution of sample means for 500 samples of size 16?
(circleone) A B C D EAnswer each of the following questions regarding the sampling distribution you chose for the above question
(e) What do you expect for the shape of the sampling distribution? (check only one)
___ D Shaped more like a NORMAL DISTRIBUTION
___ D Shaped more like the POPULATION D
___ Shaped like some OTHER DISTRIBUTIONCircle the word between the two vertical lines that comes closest to completing each of the following sentences.
(f)
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(g)
I expect the sampling
distribution I chose for the


less
the same
more

VARIABILITY than/as the
sampling distribution
I chose for the first questionPlease explain your reasoning:
6. A 95% confidence interval indicates that:
7. Researchers ask a random sample of apartment dwellers in a large city their ideal air temperatures. They find the sample mean () is 72 degrees. Using a twotailed test, they reject H_{0} : µ = 68 at the 5% significance level. Which of the following could be a 95% confidence interval for µ, the average ideal temperature for all apartment dwellers in the city?
Two different pollsters, A and B, are trying to decide if a senators favorability ratings are above 50%. They each do their own random sample 1000 people in the senators state to perform a hypothesis test with.
8. Which of the following scenarios is most likely?
9. Assume the alternative hypothesis is true and that pollster A gets a pvalue of 0.055 and that B gets a 0.032. The differences in the pvalue is explained by
10. Which of the following values will always be within the upper and lower limits of a confidence interval?
Name__________________________________________ Date _______________________________
The distribution for a population of test scores is displayed below on the left. Each of the other five graphs labeled A to E represent possible distributions of sample means for random samples drawn from the population.
1. Please read each question carefully.
(a) Which graph represents a distribution of sample means for 500 samples of size 4?
(circleone) A B C D E
Please explain your reasoning:
Answer each of the following questions regarding the sampling distribution you chose for the above question
(b) What do you expect for the shape of the sampling distribution? (check only one)
___ D Shaped more like a NORMAL DISTRIBUTION
___ D Shaped more like the POPULATION
___ D Shaped like some OTHER DISTRIBUTION
Please explain your reasoning:
(c) Circle the word between the two vertical lines that comes closest to completing the following sentence.
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(d) Which graph do you think represents a distribution of sample means for 500 samples of size 16?
(circleone) A B C D E
Please explain your reasoning:
Answer each of the following questions regarding the sampling distribution you chose for the above question.
(e) What do you expect for the shape of the sampling distribution? (check only one)
___ D Shaped more like a NORMAL DISTRIBUTION
___ D Shaped more like the POPULATION
___ D Shaped like some OTHER DISTRIBUTION
Please explain your reasoning:
Circle the word between the two vertical lines that comes closest to completing each of the following sentences.
(f)
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(g)
I expect the sampling
distribution I chose for the


less
the same
more

VARIABILITY than/as the
sampling distribution
I chose for the first questionPlease explain your reasoning:
2. Which of the following statements is NOT true according to the Central Limit Theorem? Select all that apply.
___ An increase in sample size from n = 16 to n = 25 will produce a sampling distribution with a smaller standard deviation.
___ The mean of a sampling distribution of sample means is equal to the population mean divided by the square root of the sample size.
___ The larger the sample size, the more the sampling distribution of sample means resembles the shape of the population.
___ The mean of the sampling distribution of sample means for samples of size n = 15 will be the same as the mean of the sampling distribution for samples of size n = 100.
___ The larger the sample size, the more the sampling distribution of sample means will resemble a normal distribution.
Explain your reasoning:
3. If sampling distributions of sample means are examined for samples of size 1, 5, 10, 16 and 50, you will notice that as n increases in size, the shape of the sampling distribution appears more like that of the:
Explain your reasoning:
4. The amount of money college students spend each semester on textbooks is normally distributed with a mean of $195 and a standard deviation of $20. Suppose you take a random sample of 100 college students from this population. There would be a 68% chance that the sample mean () amount spent on textbooks would be between:
Explain your reasoning:
Name__________________________________________ Date _______________________________
1. Two researchers are going to take a sample of data from the same population of chemistry students. Researcher A’s sample will consist only of the students in her class. Researcher B will select a random sample of students from among all students taking chemistry. Both researchers will construct a 95% confidence interval for the mean score on the chemistry final exam using their own sample data. Which researcher’s method has a 95% chance of capturing the true mean of the population of all students taking chemistry?
Please explain your reasoning:
2. A 95% confidence interval is calculated for a set of weights and the resulting confidence interval is 22 to 28 pounds. Indicate whether EACH of the following statements is True or False.
__________ 95% of the individual weights are between 22 and 28 pounds.
__________ Most of the individual weights are between 22 and 28 pounds.
__________ The probability that the interval includes the population mean (µ) is 95%.
__________ The probability that the interval includes the sample mean ( XŻ) is 95%.
__________ If 200 confidence intervals were generated using the same process, about 10 of the confidence intervals would not include the population mean (µ).
Please explain your reasoning:
3. A 95% confidence interval indicates that:
Please explain your reasoning:
4. Which of the following is true?
Please explain your reasoning:
5. Researchers ask a random sample of apartment dwellers in a large city their ideal air temperatures. They find the sample mean () is 72 degrees. Using a twotailed test, they reject H_{0} : µ = 68 at the 5% significance level. Which of the following could be a 95% confidence interval for µ, the average ideal temperature for all apartment dwellers in the city?
Please explain your reasoning:
The following situation is for problems 6 and 7: The average number of fruit candies in a large bag is estimated. The .95 confidence interval is [4048].
6. Based on this information, you know that the best estimate of the population mean is
Please explain your reasoning:
7. Based on this information, you know that you can reject H_{0} : µ = 38 at p =
Please explain your reasoning:
8. Which of the following values will always be within the upper and lower limits of a confidence interval for the mean?
Please explain your reasoning:
The following situation is used for problems 9 and 10: Two different pollsters, A and B, are trying to decide if a mayor’s favorability ratings are above 50%. They each do their own random sample 1000 people in the senator’s state to perform a hypothesis test with.
9. Which of the following scenarios is most likely?
Please explain your reasoning:
10. Assume the alternative hypothesis is true and that pollster A gets a pvalue of 0.032 and that B gets a 0.055. The differences in the pvalue is explained by
Please explain your reasoning:
Name__________________________________________ Date _______________________________
1. Two researchers are going to take a sample of data from the same population of physics students. Researcher A will select a random sample of students from among all students taking physics. Researcher B’s sample will consist only of the students in her class. Both researchers will construct a 95% confidence interval for the proportion of scores on the physics final exam above 80% using their own sample data. Which researcher’s method has a 95% chance of capturing the true mean of the population of all students taking physics?
Please explain your reasoning:
2. A 95% confidence interval is calculated for a set of weights and the resulting confidence interval is 42 to 48 pounds. Indicate whether EACH of the following statements is True or False.
__________ 95% of the individual weights are between 42 and 48 pounds.
__________ Most of the individual weights are between 42 and 48 pounds
__________ The probability that the interval includes the population mean (µ) is 95%.
__________ The probability that the interval includes the sample mean () is 95%.
__________ If 200 confidence intervals were generated using the same process, about 10 of the confidence intervals would not include the population mean (µ).
Please explain your reasoning:
3. A 95% confidence interval indicates that:
Please explain your reasoning:
4. Which of the following is true?
Please explain your reasoning:
5. Researchers ask a random sample of apartment dwellers in a large city their ideal air temperatures. They find the sample proportion () of apartment dwellers that prefer the temperature above 70 degrees is 0.65. Using a twotailed test, they reject H_{0} : p =0.50 at the 5% significance level. Which of the following could be a 95% confidence interval for p, the average ideal temperature for all apartment dwellers in the city?
Please explain your reasoning:
The following situation is for problems 6 and 7: The proportion of red fruit candies in a large bag is estimated. The .95 confidence interval is [0.300.50].
6. Based on this information, you know that the best estimate of the population proportion is
Please explain your reasoning:
7. Based on this information, you know that you can reject H_{0} : p = .025 at a pvalue =
Please explain your reasoning:
8. Which of the following values will always be within the upper and lower limits of a confidence interval?
Please explain your reasoning:
The following situation is used for problems 9 and 10: Two different pollsters, A and B, are trying to decide if a senator’s favorability ratings are above 33%. They each do their own random sample 1000 people in the senator’s state to perform a hypothesis test with.
9. Which of the following scenarios is most likely?
Please explain your reasoning:
10. Assume the alternative hypothesis is true and that pollster A gets a pvalue of 0.055 and that B gets a 0.032. The differences in the pvalue is explained by
Please explain your reasoning:
Name__________________________________________ Date _______________________________
1. Which of the following distributions shows MORE variability?
A has more variability __________ B has more variability__________
Circle the statement (or statements) that led you to select your answer above.
(a) Because it’s bumpier
(b) Because it’s more spread out
(c) Because it has a larger number of different scores
(d) Because the values differ more from the center
(e) Other (please explain)
2. Which of the following is true?
Please explain your reasoning:
3. Please read each question carefully.
(a) Which graph represents a distribution of sample means for 500 samples of size 4?
(circleone) A B C D EPlease explain your reasoning:
Answer each of the following questions regarding the sampling distribution you chose for the above question
(b) What do you expect for the shape of the sampling distribution? (check only one)
_____ D Shaped more like a NORMAL DISTRIBUTION
_____ D Shaped more like the POPULATION
_____ D Shaped like some OTHER DISTRIBUTIONPlease explain your reasoning:
(c) Circle the word between the two vertical lines that comes closest to completing the following sentence.
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(d) Which graph do you think represents a distribution of sample means for 500 samples of size 16?
(circleone) A B C D EPlease explain your reasoning:
Answer each of the following questions regarding the sampling distribution you chose for the above question
(e) What do you expect for the shape of the sampling distribution? (check only one)
_____ Shaped more like a NORMAL DISTRIBUTION
_____ Shaped more like the POPULATION
_____ Shaped like some OTHER DISTRIBUTION
Please explain your reasoning:
Circle the word between the two vertical lines that comes closest to completing each of the following sentences.
(f)
I expect the sampling
distribution to have

less
the same
more

VARIABILITY than/as
the populationPlease explain your reasoning:
(g)
I expect the sampling
distribution I chose for the


less
the same
more

VARIABILITY than/as the
sampling distribution
I chose for the first questionPlease explain your reasoning:
The following situation is for problems 4 and 5: Two different pollsters, A and B, are trying to decide if a governor’s favorability ratings are above 60%. They each do their own random sample 1000 people in the senators state to perform a hypothesis test with.
4. Which of the following scenarios is most likely?
Please explain your reasoning:
5. Assume the alternative hypothesis is true and that pollster A gets a pvalue of 0.055 and that B gets a 0.032. The differences in the pvalue is explained by
Please explain your reasoning:
6. Which of the following values will always be within the upper and lower limits of a confidence interval?
Please explain your reasoning:
7. Which of the following values will always be within the upper and lower limits of a confidence interval?
Please explain your reasoning:
8. A 95% confidence interval indicates that:
Please explain your reasoning:
9. Two researchers are going to take a sample of data from the same population of biology students. Researcher A’s sample will consist only of the students in her class. Researcher B will select a random sample of students from among all students taking biology. Both researchers will construct a 95% confidence interval for the mean score on the biology final exam using their own sample data. Which researcher’s method has a 95% chance of capturing the true mean of the population of all students taking biology?
Please explain your reasoning:
10. A 95% confidence interval is calculated for a set of weights and the resulting confidence interval is 52 to 58 pounds. Indicate whether EACH of the following statements is True or False.
__________ 95% of the individual weights are between 52 and 58 pounds.
__________ Most of the individual weights are between 52 and 58 pounds.
__________ The probability that the interval includes the population mean (µ) is 95%.
__________ The probability that the interval includes the sample mean () is 95%.
__________ If 200 confidence intervals were generated using the same process, about 10 of the confidence intervals would not include the population mean (µ).
Please explain your reasoning:
NOTE: All pvalues are for a oneside McNemar’s test unless stated otherwise. Items are denoted by the assessment on which they appeared followed by a dash and the item number on that particular assessment. For example, question 1 on Assessment 1 is denoted A11.
This problem looks at two histograms and asks which has more variability and why. Twenty of the 27 (74%) students got the question of which graph has more variability correct on the pretest and 22 (81%) did on Assessment 5, with two missing responses. All students who got the correct answer on A1 also responded correctly on A5. Of the seven who incorrectly answered the question on A1, four got the correct answer on A5. A onesided McNemar test gives a pvalue of 0.0625.
On the why question there where two correct response. On the pretest 11 of the 27 (41%) correctly identified one of the two responses and one student correctly identified both. On Assessment 5, 17(63%) correctly identified one of the correct answers and again only one identified both, with one missing response. Of the 11 correct responses, three gave incorrect responses on A5; of the 16 incorrect response 9 gave a correct response on A5. A onesided McNemar test gives a pvalue of 0.073.
This set of problems involve estimating a parameter based on sample information. Problem A13 is somewhat different than the other two in that it just gives the results of a sample whereas the other two provide the reader with just a confidence interval.
This set of problems pertains to understanding of the central limit theorem. All problems are exactly the same except for a different graph. Partial credit was converted to full credit for parts (a) and (d).
This problem evaluates the students understanding of the meaning of a 95% confidence interval. All four problems are exactly the same.For the purpose of this analysis partial credit is coded as correct.
This problem relates hypothesis testing to confidence intervals. All three problems are very similar. For the purpose of this analysis partial credit is coded as correct.
This problem involves interpreting the pvalues from two samples from the same population. All four problems are very similar.
This problem involves interpreting the pvalues from two samples from the same population. All four problems are very similar.
These questions ask what is always in a confidence interval. Allfour problems are exactly the same. We used A57 instead of A56 because its answer order is the same as the other three questions whereas A56 changes the order.
This problem relates to understanding bias and random samples. The three problems are essentially the same.
This is a set of true/false questions checking the understanding of confidence intervals.
This question asks if it is possible to prove the null hypothesis. All three questions are identical.
This questions gives a confidence interval and asks if a particular null hypothesis can be rejected. Both questions are similar.
Comparing Assessment 3 to Assessment 4 including partial credit given for explanations using a paired ttest yields a onesided pvalue of 0.059, with means of 11.4 and 12.0 respectively.
Comparing Assessment 3 to 4 without counting credit given for explanations (just the multiple choice problems) yields a one sided pvalue of 0.007, with means of 7.7 and 8.4 respectively.
Comparing questions 6, 7, 8, 9, and 10 on Assessment 1 with questions 3, 5, 8, 9, 10 on Assessment 3 (all questions similar)yields a one sided pvalue of 0.093, with means of 2.3 and 2.6 respectively. Comparing questions 6, 7, 8, 9, and 10 on Assessment 1 with questions 3, 5, 8, 9, 10 on Assessment 4 (all questions similar)yields a one sided pvalue of 0.037, with means of 2.3 and 2.9 respectively.
We get a one sided pvalue of 0.019 with means of 5.2 and 6.3, respectively.
Comparing question 1, 2, 3, 4, 8, 9, 10 on Assessment 3 withquestions 2, 4, 5, 7, 8, 9, 10 on Assessment 5 yields a pvalue of 1  0.395 with means of 6.02 and 5.92, respectively. Comparing question 1, 2, 3, 4, 8, 9, 10 on Assessment 4 withquestions 2, 4, 5, 7, 8, 9, 10 on Assessment 5 yields a pvalue of 1  0.151 with means of 6.35 and 5.92, respectively.
On A22, 7 out of 26 students (27%) received at least partial credit with only 1 receiving full credit. On A23, 9 out of 26 students were correct (35%) while 21 out of 26 (81%) were correct on A24.
Note: Students’ responses were coded into two categories for each item: The response of interest and the "other" category, which included all other responses. The numberletter pairs refer to the question number and the response of interest (e.g. 6a/c refers to question 6, choices a and c). All pvalues are for a .2 test resulting form a 2×2 contingency table.
^{1}Problems that received partial credit of 0.5 are Assessment 1 problem 5a answer d (A15a d), A15d c, A16 d, A17 a, A21a a, A33 c, A35 a, A43 d, A45 a, and A53a b, while problems A34a, A44 c, and A52 c received a 0.25 partial credit. In addition, on A11 and A51 full credit was only given to students who answered b and d, while partial credit was given to those who answered either b or d (but not both).
^{2}The sampling distributions questions were problem 5 on Assessment 1, problem 1 on Assessment 2, and problem 3 on Assessment 5.
^{3}delMas, Garfield, and Chance (1999) describe "good reasoning" as "When a student chose a histogram for the larger sample size that was shaped like a normal distribution and that had less variability than the histogram chosen for the smaller sample size." They describe "larger to smaller reasoning" as when "students chose a histogram with less variability for the larger sample size."
^{4}The results of the statistical tests can be found in Appendix C.
Bradley, D. R., Hemstreet, R. L., and Ziegenhagen, S. T. (1992), "A Simulation Laboratory for Statistics," Behavior Research Methods, Instruments, and Computers, 24, 190204.
Bradstreet, 1996. T.E. Bradstreet, Teaching introductory statistics courses so that nonstatisticians experience statistical reasoning. The American Statistician 50 (1996), pp. 69–78.
Chance, B. L. (1997), "Experiences with Authentic Assessment Techniques in an Introductory Statistics Course," Journal of Statistics Education, [Online], 5(3). (ww2.amstat.org/publications/jse/v5n3/chance.html)
Dambolena, I. G. (1986), "Using Simulation in Statistics Courses," Collegiate Microcomputer, 4, 339344.
delMas, R. C., Garfield, J., & Chance, B. (1999), "A Model of Classroom Research in Action: Developing Simulation Activities to Improve Students’ Statistical Reasoning," Journal of Statistics Education 7(3). (ww2.amstat.org/publications/jse/secure/v7n3/delmas.cfm)
Dietz, E. J. (1993), "A Cooperative Learning Activity on Methods of Selecting a Sample," The American Statistician, 47, 104108.
Fillebrown, S. (1994), "Using Projects in an Elementary Statistics Course for NonScience Majors," Journal of Statistics Education, [Online], 2(2). (ww2.amstat.org/publications/jse/v2n2/.llebrown.html)
Giesbrecht, N. (1996), "Strategies for Developing and Delivering effective IntroductoryLevel Statistics and Methodology Courses," ERIC Document Reproduction Service, No. 393668, Alberta, BC.
Gnanadesikan, M., Scheaffer, R., Watkins, A. & Witmer, J. (1997) "An ActivityBased Statistics Course," Journal of Statistics Education, [Online], 5(2). (ww2.amstat.org/publications/jse/v5n2/gnanadesikan.html)
Goodman, T. A. (1986), "Using the Microcomputer to Teach Statistics," Mathematics Teacher, 79, 210215.
Gordon, F. (1987), "Computer Graphics Simulation of the Central Limit Theorem," Mathematics and Computer Education, 2, 4855.
Gordon, F. S., and Gordon, S. P. (1989), "Computer Graphics Simulations of Sampling Distributions," Collegiate Microcomputer, 7, 185189.
Gratz, Z. S., Volpe, G. D., and Kind, B. M. (1993), "Attitudes and Achievement in Introductory Psychological Statistics Classes: Traditional versus Computer Supported Instruction," ERIC Document Reproduction Service No. 365 405, Ellenville, NY.
Halley, F. S. (1991), "Teaching Social Statistics with Simulated Data," Teaching Sociology, 19, 518525.
Hesterberg, T. C. (1998), "Simulation and Bootstrapping for Teaching Statistics," American Statistical Association Proceedings of the Section on Statistical Education, Alexandria, VA: American Statistical Association, 4452.
Hodgson, T. R. (1996), "The Effects of HandsOn Activities on Students' Understanding of Selected Statistical Concepts," in Proceedings of the Eighteenth Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education, eds. E. Jakubowski, D. Watkins, and H. Biske, Columbus, OH: ERIC Clearinghouse for Science, Mathematics, and Environmental Education, pp. 241246.
Hogg, R.V. (1991), "Statistical Education: Improvements are Badly Needed," The American Statistician, 45, 342343.
Hubbard, R. (1992), "Teaching Statistics With MINITAB," Australian Mathematics Teacher, 48, 810.
Hunter, W.G. (1977), "Some Ideas about Teaching Design of Experiments, with 25 Examples of Experiments Conducted by Students," The American Statistician, 31, 1220.
Karley, L. M. (1990), "Using Computer Graphics in Statistics," Mathematics and Computer Education, 24, 232239.
Ledolter, J. (1995), "Projects in Introductory Statistics Courses," The American Statistician, 49, 364 367.
Lunsford, M. L., Rowell, G. H., & GoodsonEspy, T. (2006), "Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in PostCalculus Probability and Statistics Classes," Journal of Statistics Education, [Online], 14(3). (ww2.amstat.org/publications/jse/v14n3/lunsford.html)
Mackisack, M. (1994), "What is the Use of Experiments Conducted by Statistics Students?" Journal of Statistics Education [Online], 2(1). (http://ww2.amstat.org/publications/jse/v2n1/mackisack.html)
Marasinghe, M. G., Meeker, W. Q., Cook, D., and Shin, T. (1996), "Using Graphics and Simulation to Teach Statistical Concepts," The American Statistician, 50, 342351.
Maxwell, N. (1994), "A CoinFlipping Exercise to Introduce the pvalue," Journal of Statistics Education, [Online], 2(1). (ww2.amstat.org/publications/jse/v2n1/maxwell.html)
McBride, A. B. (1996), "Creating a Critical Thinking Learning Environment: Teaching Statistics to Social Science Undergraduates," Political Science and Politics, 29, 517521.
Mills, J. D. (2002), "Using Computer Simulation Methods to Teach Statistics: A Review of the Literature," Journal of Statistics Education, [Online], 10(1). (ww2.amstat.org/publications/jse/v10n1/mills.html)
Mittag, K. C. (1992), "Using Computers to Teach the Concepts of the Central Limit Theorem," ERIC Document Reproduction Service No. 349 947, San Francisco, CA.
National Council for Teachers of Mathematics, (2000), Principles and standards for school mathematics, Reston, VA: Author.
National Council for Teachers of Mathematics, (2006), Curriculum focal points for prekindergarten through grade 8 mathematics:A quest for coherence, Reston, VA: Author.
Packard, A. L., Holmes, G. A., and Fortune, J. C. (1993), "A Comparison of Three Presentation Methods of Teaching Statistics," ERIC Document Reproduction Service No. 365 696, Chicago, IL.
Pulley, L. B., and Dolbear, F. T. (1984), "Computer Simulation Exercises for Economics Statistics," Journal of Economics Education, 3, 7787.
Schwartz, D. L., Goldman, S. R., Vye N. J., Barron, B. J., and The Cognition Technology Group at Vanderbilt (1997), "Aligning Everyday and Mathematical Reasoning: The Case of Sampling Assumptions," in Reflections on Statistics: Agendas for Learning, Teaching and Assessment in K12, ed. S. Lajoie, Hillsdale, NJ: Erlbaum.
Snee, R. D.(1993), "What’s Missing in Statistical Education?", The American Statistician, 47, 149154.
Sullivan, M. M. (1993), "Students Learn Statistics When They Assume a Statistician’s Role," ERIC Document Reproduction Service No. 368 547, Boston, MA.
Velleman, P. F., and Moore, D. S. (1996), "Multimedia for Teaching Statistics: Promises and Pitfalls," The American Statistician, 50, 217225.
Von Glaserfeld, E. (1987), "Learning as a Constructive Activity," in Problems of Representation in the Teaching and Learning of Mathematics, Hillsdale, NJ: Lawrence Erlbaum Associates, 317.
Wood , M. (2005), "The Role of Simulation Approaches in Statistics," Journal of Statistics Education, [Online], 13(3). (ww2.amstat.org/publications/jse/v13n3/wood.html)
Thomas J. Pfaff
Ithaca College
953 Danby Rd.
Ithaca, NY, 14850
tpfaff@ithaca.edu
(607) 2747066
Aaron Weinberg
Ithaca College
953 Danby Rd.
Ithaca, NY, 14850
aweinberg@ithaca.edu
(607) 2747081
Volume 17 (2009)  Archive  Index  Data Archive  Resources  Editorial Board  Guidelines for Authors  Guidelines for Data Contributors  Home Page  Contact JSE  ASA Publications