Comparing Home Run Rates – Part 2
Neil Paine of fivethirtyeight.com wrote an article this week on the decline in home run hitting in the 2022 season. It is an interesting article, but I want to suggest an improvement on his comparison of home run rates. We focus on the following figure copied from the article that breaks down batted balls hit at different exit velocities. For each bin of exit velocities, the figure displays the percentage of home run balls in the 2022 season, the HR percentage for the 2017-21 seasons, and the difference in percentages.
From this figure, how would you summarize the difference in home run rates between 2022 and 2017-2? It is challenging to summarize the change in rates since the difference in percentages is not constant — the difference is greatest (between 10-12 percentage points) for launch speeds 100 to 106 mph, and smallest for high and low launch speeds.
I illustrate this issue by the following mean-difference plot where I graph the mean percentage against the difference in percentages. We see the difference in percentages is largest (that is, most negative) when the mean percentage is close to 50%, and smallest for mean percentages close to 0 and 100%.
Logits to the Rescue
The issue here is that percentages have different variability across the range of values 0 to 100. Percentages close to 50% have high variation and percentages close to the endpoints (0 and 100) have small variation. So the pattern that we see in the above mean-difference graph (high to low to high) is not meaningful. It is hard to summarize the change in percentages since the size of the change is confounded with this variability issue.
There is a simple fix — instead of percentages, we reexpress these rates on the logit scale where
logit(percentage) = log(percentage) – log(100 – percentage)
For each of the bins of exit velocity, I rexpress both the 2022 percentage and the 2017-2021 percentage to logits. The following graph is a mean-difference graph of the logits — I plot the mean logit against the difference in logits.
Now we can get an easy comparison of the change in home run rates between 2017-2021 and 2022. On the logit scale, the 2022 home run has decreased by about .5. This is a simple comparison of the change in home run rates between the two sets of seasons.
Change in Home Run Rates Across Launch Angle and Exit Velocity
In my ShinyBaseball package, I have a Shiny app LogitHomeRunRates() that allows one to compare home run rates for two seasons across bins defined by both launch angle and exit velocity. Here’s a snapshot of the app comparing 2019 and 2021 seasons (I’m using data through games of May 24):
Let’s focus on the second row of this table where the exit velocity is between 100 and 105 mph. Note for the four bins of values of launch angle, the 2022 home run rate is -0.57 to -0.50 smaller on the logit scale. The logit scale has removed the variability issue of proportions, and note that this change in logits is consistent with the comparison in logits using the data from the 538.com article.
Why Not Use Logits?
There is a good reason why Neil Paine in the fivethirtyeight.com article didn’t express the home run percentages to logits. Readers of the article may not be familiar with logits — the world generally talks about percentages on the 0 to 100 scale. But when you want to compare rates across different groups (like different bins of exit velocity) then you will run into this variability issue. By the way, regression of rates is typically performed using a logistic model where the logit of a probability is expressed as a linear function of covariates.
I recently had a post comparing home run rates for two seasons justifying the need to change to a logit scale. There are many cases where people want to compare rates across games and usually this comparison would benefit if the rates are expressed on a logit scale.