NAME: Modeling Rare Baseball Events -- Are They Memoryless? TYPE: Census SIZE: no hitters file – 206 observations, 7 variables; cycles file – 225 observations, 6 variables; triple plays file – 511 observations, 16 variables DESCRIPTIVE ABSTRACT: Three sets of rare baseball events – pitching a no-hit game, hitting for the cycle, and turning a triple play – offer excellent examples of events whose occurrence may be modeled as Poisson processes. That is, the time of occurrence of one of these events doesn’t affect when we see the next occurrence of such. We modeled occurrences of these three events in Major League Baseball for data from 1901 through 2004 including a refinement for six commonly accepted baseball eras within this time period. Model assessment was primarily done using goodness of fit analyses on inter-arrival data. DATA SOURCES: www.Baseball-Almanac.com/feats/feats16d.shtml www.Retrosheet.org tripleplays.sabr.org/tp_sum.htm DATASETS LAYOUT: Spreadsheet in MS Excel no hitters.xls: Variable Description Year Major League Baseball season Date Day and Month Pitcher(s) Pitcher(s) who pitched the no-hitter Result Teams and score; ‘(P)’ indicates a perfect game Game Game, that Year, in which no-hitter was pitched Inter-arrival Time Games since previous no-hitter (possibly in a prior Year) Games in Season Total number of games in that Year cycles.xls: Variable Description Year Major League Baseball season Date Day and Month Player Player to hit for cycle Game Game, that Year, in which cycle was hit Inter-arrival Time Games since previous cycle (possibly in a prior Year) Games in Season Total number of games in that Year triple plays.xls: Variable Description Year Major League Baseball season Date Day and Month (with (1) or (2) indicating a game of a double-header) Game Game, that Year, in which triple play occurred IAT Games since previous triple play (possibly in a prior Year) Team Team to defensively execute a triple play LG League of Team H/A Home (‘vs’) or Away (‘at’) Opposing Team Offensive team to suffer a triple play OLG League of Opposing Team Players Defensive players involved in the execution of the triple play INN Inning the triple play occurred SCRVH Final score of the game, visitors listed first, then home Batter Batter to hit into the triple play Runner1 Runner on first, if any Runner2 Runner on second, if any Runner3 Runner on third, if any PEDAGOGICAL NOTES: All games in which one of our events occurred were treated as the first game of the day, unless it was the second game of a double-header, and then it was treated as the last game of the day. The data is modeled with an exponential distribution and goodness-of-fit tests (chi-squared and Anderson-Darling) are conducted. The data is examined as a whole and in subsets for exponentiality. REFERENCES: D’Agostino, R. B. and Stephens, M. A. (1986), Goodness-of-Fit Techniques, New York, NY: Marcel Dekker, Inc. Devore, J. L. (1995), Probability and Statistics for Engineering and the Sciences, Fourth Edition, New York, NY: Duxbury Press. Siwoff, S. (2004), The Book of Baseball Records, 2004 Edition, New York, NY: Seymour Siwoff, Elias Sports Bureau, Inc. Potthoff, R. F. and Whittinghill, M. (1966), “Testing for Homogeneity: II. The Poisson Distribution”, Biometrika, Volume 53(1/2), pp. 183-190. Internet site for no-hit game definition: mlb.mlb.com/NASApp/mlb/mlb/official_info/official_rules/foreword.jsp, accessed 11 Feb 06. Internet site for baseball eras: www.netshrine.com/era.html, accessed 10 Feb 2006. SUBMITTED BY: Michael Huber Department of Mathematical Sciences Muhlenberg College Allentown, PA 18104 U.S.A. huber@muhlenberg.edu Andrew Glen Department of Mathematical Sciences United States Military Academy West Point, NY 10996 U.S.A. andrew.glen@usma.edu