Most readers of this blog are aware of the volatile changes in home run hitting during the Statcast era between 2015 and 2021. From 2015 to 2017, there was a big increase in the home run total from 4909 to 6105, followed by a decrease to 5585 home runs in the 2018 season. Then we saw a big increase in the home total to 6776 in the 2019 season, and a decrease in home runs to 5944 in the 2021 season.
Although the changes in home run totals are clear, the reasons for the change are not as clear. There has been a substantive change in the hitting behavior of players — players are hitting the ball harder at “home run friendly” launch angles between 20 and 40 degrees. Also there have been changes to the composition of the ball which affect its tendency to carry through the air, or more formally its drag coefficient.
In this blog post, we focus on ball behavior and explore the changes in in-play home run rates between two seasons for specific regions of launch variables. It is challenging to summarize these rate changes if one looks at changes in home run percentages. We demonstrate that we can gain a better understanding of these rate changes by reexpressing the percentages to logits. By summarizing these season to season rate changes by use of logits, we learn about the changes in the carrying properties of the baseball through the Statcast era.
Home Run Fractions for 2019 and 2021
The Statcast dataset contains the launch speed and launch angle for all batted balls in the 2019 and 2021 seasons with an indicator variable saying if a home run was hit or not. Suppose we focus on values of the launch angle between 20 and 40 degrees and launch speeds between 95 and 110 mph — most home runs occur for these launch variable values. I divide the (launch speed, launch angle) space into 12 subregions — for each season (2019 and 2021) and each subregion we record the number of batted balls in that subregion and the count of home runs. These values are represented as “home run fractions” in the following display. For example, look at the fraction “712 / 877” in the top-left subregion under 2019. This means that there were 877 batted balls where the launch angle was between 20 and 25 degrees and the launch speed was between 105 and 110 mph. Of these batted balls, 712 were home runs.
There were 3.7% fewer batted balls in the 2021 season compared to 2019, so one might expect the counts of batted balls (the denominators) in these regions to be smaller in 2021. That is somewhat true in the middle and bottom rows, but the denominators are substantially larger in the top row. That means that batters are hitting more high-velocity balls with these launch angles in the 2021 season.
Now compare the home run counts (the numerators) in the 2019 and 2021 tables. There are significant drops in the 2021 home run counts in the middle and bottom rows, but there are increases in the home run counts in the top row.
Comparing HR Percentages
To compare the carry characteristics of the ball in the two seasons, one can compute home run percentages as shown below. For example, the 81.2 in the upper-left corner indicates that 81.2% of the batted balls hit with launch angles in (20, 25) degrees and exit velocities in (105, 110) mph were home runs. If you look at the percentages for a single season, note that the home run percentages (as expected) increase for larger launch speeds. Also the balls hit between launch angles of 25 and 35 degrees (for a given interval of launch speed) are most likely to be home runs.
To compare the home run percentages for the two seasons, it is natural to look at the difference
HR percentage (2021) – HR percentage (2019)
that are displayed in the following figure.
These differences in percentages are hard to interpret and summarize since they show a lot of variability. For one subregion (second row and first column) there is a decrease in 14.6 percentage points and for the region directly below there is an decrease of 5.4 points. It is hard to compare changes of 14.6 and 5.4 since in the first case, one is comparing two percentages in the 30-45% range, and in the second case, one is comparing two percentages in the 5-11% range. (We will see shortly using our reexpression that the 5.4 percent decrease is more significant in some sense than the 14.6 percent decrease.)
We Have a Variability Issue
It is different to compare percentages since they have a variability problem. Percentages near 50% like the ones in the middle row of the table have high variation. In contrast, percentages near 0% and 100% (like the ones in the bottom and top rows of the table) have small variation. For example, a collection of percentages close to 0% will have small variation since they are all bunched up at the limit of 0. One has a similar issue with a collection of percentages bunched up towards 100%. One can see this problem when one computes the standard error of a proportion. If one computes a proportion p based on a sample of size n, then the standard error of this estimate is
This standard error is largest when p = 0.5 and smallest when p is close to 0 or 1.
Logits to the Rescue
Proportions (or percentages) are hard to compare due to the differences in variability for small, moderate, and large proportion values. One can correct this issue by reexpressing the proportion by a logit defined by
Unlike proportions, logits have approximately the same variability for different proportion values. So comparisons of proportions are easier when they are made on the logit scale.
In the figure, I have replaced the percentages by the corresponding logits. We’re not familiar with the logit scale, but here are some basic properties:
- a logit value of 0 corresponds to a percentage of 50%
- positive logits correspond to percentages over 50%, negative logits correspond to percentages under 50%
- logits have a nice symmetry property: if a percentage of, say 30%, corresponds to a logit of -0.85, a percentage of 100 – 30 = 70% has a logit of +0.85.
Comparing HR Logits
To compare the two seasons, we take the difference in logits:
logit HR percentage (2021) – logit HR percentage (2019).
Here are the differences in logits for our home run rate example.
Note that we now see some clustering of similar values which makes our comparison easier. For five of the subregions that are circled, the logit home run percentage decreases approximately by 0.7. For three other regions, the logit home run percentage decreases by 0.25 to 0.4, and for two regions, the logit home run percentage increases. By reexpressing by logits, we have removed the variability issue (between percentages of different values) and we have a simplified comparison between the carry characteristics of the ball for the two seasons.
What Have We Learned?
Suppose we focus on the batted balls where the launch angle is between 20 and 30 degrees and the exit velocity is between 100 and 110 mph. We break this region into four subregions such as we did above and summarize the change between consecutive full seasons by the median increase in the logit home run rate. The following table summarizes what we learned. Between the 2015 and 2016 seasons there was little change in the home run rate. But we see an increase in the ball carry in 2017 followed by a deadening of the ball in 2018 followed by more carry in 2019 followed by more deadening in the next full season 2021. (This pattern of change in the ball behavior is crazy, isn’t it!) One nice feature of logit comparisons is that you can add the individual differences to compare other seasons. So, for example if you want to compare the 2018 with 2021 seasons, the logit change would be 0.625 (the change between 2018 and 2019) PLUS – 0.66 (the change between 2019 and 2021) = -0.035. Since this change is close to zero, this means the ball carry in 2021 was similar to that in 2018.
Home Run Rates of Batted Balls with Launch Angle between 20 and 30 degrees and Exit Velocities between 100 and 110 mph. ------------------------------------------------------------------- Comparison `Logit Change` Explanation 1 2015-16 -0.015 Little Change in Carry Behavior 2 2016-17 0.23 More Carry in 2017 season 3 2017-18 -0.48 More Deadening in 2018 season 4 2018-19 0.625 More Carry in 2019 season 5 2019-21 -0.66 More Deadening in 2021 season
- Looking Ahead to the 2022 Season. What can we predict about home run hitting in the 2022 season? Recall that home run hitting is affected by both player behavior and the properties of the baseball. I believe that players will continue to hit balls at higher exit velocities and home-run friendly launch angles and that could contribute to an increase in home run hitting. But the characteristics of the 2022 ball are somewhat unknown. MLB did announce changes to the baseball before the 2021 season that contributed to the decrease in home run rates and my understanding (from a recent MLB memo) is that the 2021 baseball will again be used in the 2022 season. If this true, then any increase in the home run rate from 2021 to 2022 would be solely due to changes in player behavior. The statistical approach described here will be helpful in exploring the changes in the home run rates.
- John Tukey and Flogs. I taught a course on Exploratory Data Analysis for many years at BGSU. This course was based on ideas on exploring data by the great statistician John Tukey. One of Tukey’s ideas was that proportions are easier to explore when expressed on a folded log (flog) scale — this scale is essentially the same thing as taking a logit. Here is a recent post on the benefits on working on a flog scale for proportion data.