Online Program

Return to main conference page
Friday, May 18
Sports and Game Analytics
Fri, May 18, 5:15 PM - 6:15 PM
Lake Fairfax B

Baseball Pitching and Swing Contact Modeling (304591)

*Andrew Chen, University of San Francisco 
Mason Chen, Stanford OHS 
Tony S. Liao, CoachMe LLC 

Keywords: Baseball, Sports, Logistic Regression, JMP, Modeling

Statistics and probability are widely used in Baseball for pitchers to design their pitching patterns against each hitter, and in order to lower their earned-run average (ERA) and win more games. ERA is calculated by earned runs times 9 divided by innings pitched which is the most important index for pitchers. However, predicting ERA is complicated. The objective of this abstract is to design a Swing Contact model to optimize pitching performance. This model shall predict each the result of each at bat and accumulated pitcher ERA. The swing contact performance was recorded in five categories: (1) No Swing, (2) No Contact, (3) Weak Contact, (4) Medium Contact, and (5) Strong Contact. The raw data was recorded on one rookie Major League Baseball (MLB) pitcher playing against several MLB teams in the 2017 regular season. The Swing Contact modeling input variables were hitter position (right or left), pitching location (3x3 grid zone in horizontal/vertical), Ball or Strike, and pitching velocity (miles per hour). The types of pitches (fastball, sinker, slider, change-up, knuckle ball etc.) can be indicated by pitching velocity. Data was analyzed by JMP Pro Distributions, Graph Builder, Multivariate Chart, and Bootstrap Random Forest. The graphical analyses showed the relative strengths/ weaknesses of the pitcher against hitters. The interactive and conditional graphical techniques could also study the pitching sequence pattern for pitchers to optimize their pitching flow pattern. The pitching velocity and pitching location are the most deciding factors on the Swing Contact result. A Nominal Logistic regression model was built for predicting Swing Contact through the Bootstrap Random Forest method. The JMP prediction profiler can breakdown the likely probability of each swing contact category in any particular pitching situation. The model has been validated with an R-Square about 75%. This statistical methodology can help pitchers to minimize the swing