NAME: Barry Bonds' 2001 Plate Appearances TYPE: Census SIZE: 648 observations, 15 variables DESCRIPTIVE ABSTRACT: Data are provided for Barry Bonds' plate appearances in the 2001 baseball season. Variables include characteristics of the innings before the first pitch to Bonds (e.g., the number of outs, the number of runners on each base, the score, the opposing pitcher's earned run average) and after the first pitch to Bonds (e.g., the outcome of the appearance, how many runs scored in the inning after Bonds hits). SOURCES: The data were obtained from CBS Sportsline at http://www.cbs.sportsline.com. This site has pitch-by-pitch summaries of every baseball game in 2001. Pitchers' ERAs were obtained from ESPN at http://www.espn.com. These data were analyzed in Reiter, J. P. (2002) "Should teams walk or pitch to Barry Bonds?" By the Numbers: The Newsletter of the SABR Statistical Analysis Committee, 12 (November 2002), pp. 7-11. VARIABLE DESCRIPTIONS: Each plate appearance is on a single line of a text file with line breaks. Values are delimited by spaces. Columns Description 1-3 Plate appearance number. 5-7 Number of the game in the season. 9 Number of the plate appearance within the game. 11 Equals one for games in San Francisco and equals zero otherwise. 13 Equals one when there is a runner on first base when Bonds appears and equals zero otherwise. 15 Equals one when there is a runner on second base when Bonds appears and equals zero otherwise. 17 Equals one when there is a runner on third base when Bonds appears and equals zero otherwise. 19 Number of outs in inning when Bonds appears. 21-22 Inning of plate appearance. 24 Number of runs scored by Giants in the inning after first pitch to Bonds. 26 Equals one if Bonds walks and equals zero otherwise. 28 Equals one if Bonds walks intentionally and equals zero otherwise. 30 Equals zero if Bonds does not reach base. Equals one if Bonds reaches first base on a single or error. Equals two if Bonds reaches second base on a double or error. Equals three if Bonds reaches third base on a triple or error. Equals four if Bonds hits a home run. Equals five if Bonds walks or is hit by a pitch. 32-35 Opposing pitchers' career earned run average as of the end of the 2000 season. 37-38 Giants score just before first pitch to Bonds 40-41 Opposing team's score just before first pitch to Bonds. NOTES: There are a few games for which data were not available, due to invalid web links. There are two pitchers for whom I could not locate their earned run averages. These missing data should not bias analyses, since they are missing completely at random. For rookie pitchers, I used their 2001 earned run average. STORY BEHIND THE DATA: Barry Bonds is probably the most well-known current baseball player. His batting statistics in 2001 reflect arguably the greatest individual season of all time. One common strategy employed by opposing managers in 2001 was to walk Bonds rather than pitch to him. This avoids the risk of Bonds hitting a home run, but it puts an extra runner on base. Hence, we are confronted with an interesting question in baseball strategy: does walking rather than pitching to Bonds reduce the chance that the Giants will score runs? These data were collected to examine this question. Analyses of the data suggest that differences in the two strategies are small; we cannot rule out the possibility that walking and pitching to Bonds are equally effective. PEDAGOGICAL NOTES: The 2001 data are in fact a census, so that quantities for 2001 (e.g., average runs) are known exactly, except for errors due to the missing data. Inferential methods like hypothesis tests and confidence intervals are not needed to estimate 2001 quantities. However, the 2001 quantities themselves are not of primary interest; rather, interest centers on a hypothetical population of Barry Bonds' plate appearances under conditions similar to those in the league in 2001. We treat the data from 2001 as a random sample from such a hypothetical population. This framework is commonly employed when analyzing sociological data, such as country-wide or state-wide data. Highlighting this conceptualization helps students understand the differences between samples and censuses. There are several outcome variables that can be used to compare walking versus pitching to Bonds. One outcome is the percentage of innings in which the Giants score at least one run. This is relevant for situations in which the opposing manager does not want to give up any runs. Another outcome is the number of runs the Giants score. Students can manipulate the runs variable to examine either outcome. Bonds is not randomly assigned to be walked or pitched to, so that the data are observational rather than experimental. When comparing two treatments (walk Bonds and pitch to Bonds) with observational data, it is crucial to compare the distributions of causally-relevant background characteristics in the two treatment groups. In these data, game situation (number of outs and runners on base) and pitcher's ERA are the two most important causally-relevant characteristics. To control for game situation, students can split the data set by game situations before comparing walk-innings versus hit-innings. For example, innings in which Bonds appears with no one on and no outs can be grouped together, and then outcomes when Bonds walks or hits can be compared within those innings. Comparisons of ERA show that it is similarly distributed in walk-innings and hit-innings in all scenarios, so that it does not affect the causal conclusions. SUBMITTED BY: Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University Box 90251 Durham, NC 27708 jerry@stat.duke.edu