NAME: mlb_batting.dat TYPE: Population SIZE: 85988 observations, 21 variables DESCRIPTIVE ABSTRACT: Seasonal batting data for all players in Major League Baseball from 1871 (the first year of professional baseball) through 2009. SOURCE: Sean Lahman's Baseball Achieve website at www.baseball1.com. The dataset was formed from the Batting.csv and Master.csv datafiles from the website. FORMATS: The batting dataset is available both as a data file "mlb_batting.dat" and a R workspace "mlb_batting.Rdata". Tab characters are used to separate variables in the data file. READING INTO R: The data file can be read into R by the command batting=read.delim("http://bayes.bgsu.edu/baseball/mlb_batting.dat") VARIABLE DESCRIPTIONS: Each row represents the hitting statistics for one player in a particular season. first.name player's first name last.name player's last name name player's id code year season game games played ab at-bats r runs scored h hits x2b doubles x3b triples hr home runs rbi runs batted in sb stolen bases cs caught stealing bb base on balls so strikeouts ibb intentional base on balls hbp times hit by pitch sh sacrifice hits sf sacrifice flies gdp grounded into double plays age age of player obp on-base percentage slg slugging percentage ops ops measure (obp + slg) pa plate appearances Missing values are denoted with NA. PEDAGOGICAL NOTES: This data contains the fundamental season hitting data for all players in baseball history. One can look at the collection of batting averages or on-base percentages for all players in a particular season. One can compare hitting statistics across seasons. For example, one could explore the changes in home run rates and triples rates across seasons. One can look at the collection of batting averages or home run rates of a particular player over the years of his career. The student can learn much about the history of baseball by exploring the changes in various hitting statistics over time. REFERENCES: Albert, Jim, Baseball Data at Season, Play-by-Play, and Pitch-by-Pitch Levels SUBMITTED BY: Jim Albert Department of Mathematics and Statistics Bowling Green State University Bowling Green, OH 43403 albert@bgnet.bgsu.edu