At this All-Star Break, there has been some discussion about the “unusual” high home run rate during the first-half of the 2016 season. People are quick to try to explain this high rate of home runs; for example, the baseball commissioner is asked to explain this home run surge.
Let’s back up to a simpler question:
Is this first half home run rate of 1.16 (per team per game) really unusual in the recent history of baseball?
Collect the Relevant Data
retrosheet package, it is easy to download the Retrosheet game logs for all games in the past 50 seasons (1966-2015). A row of the data frame contains the number of home runs hit by the home and visiting teams. For each season, I collect the number of home runs and number of games played for the first half and the second half of the season.
First Half Home Run Rates
I first plot the first-half home run rate (home runs divided by games divided by 2) against season and show the “unusual” value of 1.16 in red.
What do we see? Generally, the home run rates during the seasons 1966-1993 range between 0.6 and 0.8, the rates for the seasons 1994-2006 the so-called steroids era) tend to be between 1.0 and 1.2, and the rates have tended to be between 0.9 and 1.0 in recent years. The 2016 value of 1.16 seems similar to the rates during the steroids era. But the variability of rates from season to season is remarkable — so really the value of 1.16 might just reflect the season-to-season variability in the first-half home run rates. To answer my question, 1.16 is a bit high for a home run rate, but this might be explained by the high variability of these rates.
Predicting the Second Half Rates
Of course, fans are wondering about the home run rate in the second half of season. History can be helpful in understanding this prediction — in the following graph I plot the difference SECOND HALF RATE – FIRST HALF RATE for the past 50 seasons.
There is an interesting pattern here. For the early seasons, the difference tends to be negative — for these seasons, the home run rate dropped in the 2nd half. In recent seasons, say after 2000, the difference tends to be positive which means that home run hitting increased in the second half. But it is interested in note the variability in the difference in rates — they tend to be uniformly distributed between -0.1 and 0.1.
I predict that the second half home run rate will drop significantly in 2016. Why? I am not aware of any reasonable explanation for the rise in home run rates, so my prior is that the true 2016 home rate is similar as it has been in recent seasons. Also I note from the graph below that it is pretty likely to have a large difference in first and second half rates. So I’d expect the 2nd half rate to be a good amount smaller than 1.16. By the way, this prediction is consistent with the regression effect — extreme performances (like our 1.16 home run rate) tend to move back to the average.