In last week’s post, I explored home run rates for the current 2020 season. There were three main takeaways from my work. First, in-play home run rates are currently pretty high — close to 2019 rates. Second, there appears to be less carry or more drag in the ball compared to the 2017 or 2019 seasons. Last, the rate of hard-hit balls has shown a steady increase in the Statcast era. Several readers commented on my analysis so some clarification is needed. First, I didn’t really contrast the Statcast barrel definition with the probability of a home run and I’ll try to make a comparison here. Second, saying that home run rates are high in 2020 is a bit unfair since temperature plays an important role in hitting home runs and the 2020 home runs are hit primarily in warm weather. So I will discuss the role of temperature in hitting home runs. This motivates a revision to my prediction model where I predict the 2020 home run count using a 2019 ball model.
Barrels and Probability of a Home Run
Last week, I indicated that I wasn’t familiar with the exact formula for a barrel. First, the MLB definition says that a barrel corresponds to values of launch angle and exit velocity that give a minimum .500 batting average and 1.500 slugging percentage. Also, Tom Tango was kind to show the code for the exact formula for a barrel from data from a few seasons ago. Since the formula has likely changed a little for the current season, a easy way to compare barrels with home run probabilities is to show the contour graph for the probability a home run as a function of launch angle and exit velocity from 2020 data and overlay points where the Statcast barrel variable is equal to one. What we see in the following graph is that barrels are roughly values of these launch variables where the probability of a home run exceeds 0.2, although there are barrels on the right (large values of the launch angle) where the chance of a home run is small.
By the way, since a barrel is a variable in the Statcast data, one can compute barrel rates (on balls in play) across all seasons. We see barrel rates are rising consistently during the Statcast era.
Home Runs and Temperature
Several people pointed out that my comparison of home run rates from, say 2019 to 2020, was a bit off since I didn’t account for temperature. We did show in our 2018 MLB Home Run Report that home runs are more likely for warmer days. Thanks to Bill Pettit, I became aware that the Statcast data available through the baseballr package also included the game-time outside temperature and an indication if the roof was closed in the stadium. Using 2019 data, I binned the home runs by the game-time temperature and this graph shows the in-play home run rate as a function of temperature together with a best line fit. This shows for each 10 degree increase in temperature, the in-play home run rate increases by about 0.5 percent. This is a significant effect that should be accounted for in any model for home runs.
2020 Home Runs Predictions Using a Revised 2019 Ball Model
Last week, I considered the following prediction problem. We learn about the relationship between the launch variables and home runs through a GAM model using 2019 data — I called this the 2019 ball model. Using this model, we can use values of the 2020 launch variables to predict the number of 2020 home runs through games of September 4. We predicted 1577 home runs which was 104 more than the actual count (1473) of home runs in 2020.
But this analysis could be improved since the model didn’t account for temperature — 2020 games have been played primarily in warm weather. So I consider a new model using 2019 data where the logit of the home run probability is given by
log(P(HR) = s(LA, EV) + TEMP
where s() is a smooth function of the launch variables and TEMP is the game-time temperature. Using this model I predict 1659 home runs that is 186 home runs greater than the 2020 count of 1473 (through September 4). By the way, it makes sense that the home run prediction would increase since we are now adjusting the prediction for the warm weather games in 2020.
The conclusion remains the same. There appears to be more drag (less carry) in the baseball in 2020 compared to 2019, but the difference between the two seasons is greater using a model that adjusts for game-time temperature.
While we are on the topic of home runs, let me suggest some other directions for future exploration.
- Spray angles. Certainly the spray angle is an important factor in hitting home runs — it is much easier to hit a home run down the lines where the distances to the fences are smaller. My basic feeling from previous work is that the distribution of home run spray angles has been pretty consistent through the Statcast era, although this can be checked in different ways.
- Ballpark effects. Since every ballpark is unique with respect to weather and size, there are clear ballpark effects — it is easier to hit home runs in some ballparks than others. A thorough study that helps one to better understand the ballpark home run effects would be interesting.
- Pitcher effects. Certainly home runs are easier to hit for pitches in the zone (especially in the batter’s hot area) and it might be easier to hit home runs on specific pitch types. An exploration of how home runs are impacted by pitch variables would be worthwhile.
Tom Tango was puzzled with my graph above since some of the barrel points seemed to be wrong. It appears that there was some error in the coding of the barrel variable defined in the baseballr package. I redid this graph using Tango’s code for the exact formula for a barrel . I think this is a more reasonable looking display. (Thanks to Tom for pointing out this error.)