Author Archive: Jim Albert

In-Play Home Run Hitting – Season and Month Effects

Introduction

It is May 1 and we have completed a month of 2024 baseball — roughly 1/6 of the season. Looking at the season to season stats from Baseball Reference, it appears that home run hitting is down — currently we see 1.02 home runs hit per team per game compared with 1.21 home runs hit in the previous 2023 season. But this is a biased comparison since the 2024 data is based only on baseball played during the cold April weather. That raises several questions about patterns of home run hitting over seasons and months.

  • We know that the chance of a home run is strongly dependent on the launch angle and exit velocity of the batted ball. What are the season to season patterns in home run hitting if we adjust for the launch variables?
  • Similarly, if we adjust in-play home run rates for the launch variables, what are the month to month effects, and how have these month effects varied over seasons?

We will address these questions by use of predictions from a model that includes launch variable, season and month effects.

Data

I collected all of the Statcast in-play data for the Statcast seasons 2015 through 2023. Since we are exploring effects from April through October, I omitted the shortened 2020 season from the data. For each batted ball, I have the launch angle, launch speed, date and home run indicator.

GAM Model

The intent is to explore season and month effects for home run hitting beyond what is explained by the launch velocity and launch angle measurements. Let y = 1 if we observe a home run (y = 0 otherwise) and let p denote the probability of a home run on a ball put in play. We fit the generalized additive model

log(p / (1 – p)) = s(LA, LS) + Season + Month + Season * Month

where s() is a smooth function of the launch variables LA and LS, Season and Month are categorical predictors, and we allow for interactions between Season and Month which means that the month effect can vary across seasons. This logistic GAM model can be fit by the gam() function from the mgcv package:

gam(HR ~ s(launch_angle, launch_speed) + Season * cMonth,
             data = sc_a,
             family = binomial)

In previous work (such as in the Home Run Hitting chapter of our 3rd edition of ABDWR), we focused on learning about how the probability of a home run depends on the launch variables for one season. Here we take a different approach — we fix values of launch angle and exit velocity and see how predictions from the GAM model vary across seasons and months.

Plotting Predictions Against Season

Here I set the launch angle to be 28 degrees and the exit velocity to be 100 mph. I chose these values since the chance of a home run is close to 50%. I use the model fit to predict the probability of a home run across all Statcast seasons and months. The following graph displays the probability estimates as a function of the season where I use colors to distinguish different months. Since this model includes the launch variables, these patterns represent carry effects of the ball for different seasons and months. Several observations:

  • Although the greatest home run total occurred in the 2019 season, the carry effect was slightly larger in 2017 than in 2019. The reason why there were more home runs in 2019 compared to 2017 was jointly due to the carry effect and the change in launch variables (higher exit velocities and more suitable launch angles).
  • In recent seasons, MLB has been using a relatively dead baseball and the carry effects are significantly lower in 2021-2023 compared to earlier seasons.
  • The month effects are significant. April (red in this figure) stands out where the carry effects are low. The month effects for the months June through September are similar, especially in recent seasons.

Plotting Predictions Against Month

If we plot the probability estimates against month, we get a clearer picture of the pattern of month effects in the below figure. Now the colors indicate different seasons.

  • This graph reinforces my early comment that the carry effect for home run hitting is low in the early months (April and May) and similar for the warmer months June through September.
  • Generally the curves for different seasons are parallel indicating similar month effects for the different seasons.
  • Something is weird about the 2015 season. For this season, the probability estimates rise steadily from April through June, followed by decreasing estimates for July, August and September. This suggests that someone else, perhaps a change in the composition in the ball in the middle of the 2015 season, caused this strange pattern.

Takeaways

  • This brief exploration tells us that home run hitting is clearly impacted by the weather. As the temperatures warm up, we will see an increasing rate in the 2024 home run rates. For that reason, I am reluctant to draw any conclusions about ball effects for the 2024 season until we get into the summer months.
  • In our Home Run Hitting chapter, we briefly show how home run hitting is impacted by the game time temperature. One way to improve this exploration would be to include game time temperature into the model and focus on games played in outdoor stadiums.
  • One nice feature of this model fitting is that one can separate out the player effects (launch variables) from the ball effects (carry properties). It is a straightforward way to understand the roles of both the launch variables and the ball in home run hitting.

Added May 2

Tom Tango suggested that I add a point corresponding to April of the current 2024 season. I separately fit a GAM model of the form logit(p) = s(LA, LS) to the April data to get a prediction of the HR probability for a launch angle of 28 degrees and 100 mph. Here’s the graph with the added point:

Note that the April 2024 value falls between the April probability predictions for 2022 and 2023 (2021). It is a bit early to draw conclusions about the ball effect in 2024.