NAME: playbyplay2008.dat TYPE: Population SIZE: 193492 observations on 36 variables DESCRIPTIVE ABSTRACT: Play by play data for all plays for all games during the 2008 Major League Baseball season. SOURCE: The Regular Season Event Files from the Retrosheet organization at www.retrosheet.org/game.htm. The program bevent was used to extract variables from thirty files corresponding to play by play information at each of the thirty baseball stadiums. All of the data was combined into a single file. FORMATS: The play by play dataset is available both as a data file "playbyplay2008.dat" and a R workspace "playbyplay2008.Rdata". Tab characters are used to separate variables in the data file. READING INTO R: The play-by-play data file can be read into R by the command retrosheet=read.delim("http://bayes.bgsu.edu/baseball/playbyplay2008.dat") VARIABLE DESCRIPTIONS: Each row represents information about a particular baseball play (either a batting event or a base running event) during a particular game. game.id game id v_team code for visiting team inning inning of game team_at_bat id of team at bat outs number of current outs balls number of balls strikes number of strikes v_score current visitor score h_score current home score batter id code of batter batter_hand batter side (L or R) pitcher id code of pitcher pitcher_hand pitcher side (L or R) b1_runner code of runner on first b2_runner code of runner on second b3_runner code of runner on third event code description of event leadoff_flag leadoff hitter? (TRUE or FALSE) pitchhit_flag leadoff hitter? (TRUE or FALSE) def_pos defensive position of batter (1 through 9 correspond to numerical fielding locations, 10 corresponds to DH, and 11 corresponds to pinch-hitter) batting_pos position in batting order event_code event code 0 Unknown event 1 No event 2 Generic out 3 Strikeout 4 Stolen base 5 Defensive indifference 6 Caught stealing 7 Pickoff error 8 Pickoff 9 Wild pitch 10 Passed ball 11 Balk 12 Other advance 13 Foul error 14 Walk 15 Intentional walk 16 Hit by pitch 17 Interference 18 Error 19 Fielder's choice 20 Single 21 Double 22 Triple 23 Home run 24 Missing play bevent_flag end of batting appearance (TRUE or FALSE) ab_flag indicator of at-bat (TRUE or FALSE) hit_value value of hit (0, 1, 2, 3, 4) sh_flag sacrifice hit? (TRUE or FALSE) sf_flag sacrifice fly? (TRUE or FALSE) outs_play number of outs recorded rbi_play number of rbi's credited wp_play wild pitch? (TRUE or FALSE) pb_flag passed ball? (TRUE or FALSE) nerrors number of errors on play batter_dest base reached by batter (0, 1, 2, 3, or 4) b1_runner_d code of new runner on first b2_runner_d code of new runner on second b3_runner_d code of new runner on third date date of game h_team code for home team Missing values are denoted with NA. PEDAGOGICAL NOTES: By use of the Retrosheet datasheet, one can investigate how batters and pitchers perform in different situations. For example, one can explore a particular batter's performance at home and away games, during different innings and bases situations during a game, and against different pitchers. To understand so-called situational effects, it is helpful to look at a particular effect, say home versus away, for all hitters. Then one can understand the general situational bias (players typically perform better during home games) and detect the players who deviate from the general pattern. REFERENCES: Albert, Jim, Baseball Data at Season, Play-by-Play, and Pitch-by-Pitch Levels SUBMITTED BY: Jim Albert Department of Mathematics and Statistics Bowling Green State University Bowling Green, OH 43403 albert@bgnet.bgsu.edu