As you may have noticed, I have recently taken a break from blogging on this site. I’m currently teaching an overload and I’m keeping busy blogging on the topics for my online classes (Exploratory Data Analysis and Statistical Graphics). But I’ll try to occasionally post baseball material. The 2nd edition of our Analyzing Baseball with R book will be coming out by the end of the year, and my blog posting will likely pick up once the book is available.
A batting average, by itself, is not that meaningful, but it can be represented as
BA = (1 – SO.Rate) (HR.Rate + (1 – HR.Rate) BABIP)
- SO.Rate = SO / AB (the strikeout rate)
- HR.Rate = HR / (AB – SO) (the proportion of home runs among all at-bats that are not strikeouts)
- BABIP = (H – HR) / (AB – SO – HR) (the batting average on balls hit in play)
Historical View of Three Rates
How have these rates changed over the history of baseball? If we graph the rates below, we see a dramatic change in the strikeout rates, but modest changes in the home run and BABIP rates.
But this graph is deceptive. Why? Well, there is a problem with proportions from the viewpoint of comparison. Small proportions like home run rates tend to have small variation and larger proportions like strikeout rates have larger variation. (What I am saying is that proportions close to 0 or 1 have small variation and proportions close to 0.5 have larger variation.) It is difficult to compare groups of variations that have different spreads.
Look at Logits
What we want to do is to stretch the scale of the home run rates, so that the home run rates and the strikeout rates have similar spreads. One reexpression that accomplishes this is the logit defined by logit(p) = log(p) – log(1-p). Let’s redraw our graph where we graph logit rates instead of rates. We get a different picture — the change in home run rates over the history of baseball is actually similar to the change in strikeout rates over time (on the logit scale).
Looking at Recent Seasons
Let’s zoom into the seasons since 1975 — roughly a 40 year period. For each set of rates, I fit a line — the slope of the increase in home run rates looks similar to the slope of increase of strikeout rates.
To make this even more clear, I look at the change in rate (on the logit scale) from the 1975 season. The average change across this period is very similar for home run rates and strikeout rates. (There is a difference however — the variation about the line is much greater for home run rates than for strikeout rates. There is more to be said if we explore the pattern of variation about the line.)
By the way, the slopes of the lines are 0.0137 (home run rate) and 0.0145 (strikeout rate). On the logit scale, home run rates and strikeout rates are both increasing about 1.4 % each season. Therefore, there is a strong connection between strikeout rates and home run rates in the sense that they are increasing at similar rates. In contrast, there has been a small growth in BABIP rates.
2018 Player Rates
Since the 2018 regular season just ended, it is interesting to explore the home run and strikeout rates for all regulars. I’ve highlighted extreme players who either (1) had a home run rate over 11% or (2) had a strikeout rate exceeding 35%. This highlights some well known players:
- Mike Trout had a high home run rate and a relatively small strikeout rate.
- The “.247-.247-.247-.247” Khris Davis also had a high home run rate.
- Chris Davis and Yoan Moncada were notable for high strikeout rates and modest home run rates.
- Leaving the best until last, Joey Gallo was remarkable, both for his high home run rate and high strikeout rate.
As I said before, a batting average is not meaningful since it confounds three aspects of hitting — not striking out, hitting home runs, and getting hits on balls put in play. To make sense of Khris Davis’s .247 (pick a season) average, you need to look carefully at the component rates. For example, look at Davis’ batting stats for 2017 and 2018.
Yes, Davis had a 0.247 AVG both seasons, but he achieved them in different ways. In 2017, Davis had a high SO rate (0.345), but compensated this high SO rate with a high BABIP rate of (0.296). In 2018, Davis decreased his SO rate to 0.303 (good), but his BABIP dropped to 0.267 (bad). His home run rate was 12% for both seasons. So what does it really mean for Davis to be a “0.247 hitter”? Going further, what does it mean for Davis to have 0.247 AVGs for four consecutive seasons? I’m sure there will be an attempt of some writer to explain this, but I have my doubts about any meaningful insight.
For more insight on these hitting rates, see my Who is Going to Win the Batting Crown? which looks at these rates from an inferential perspective.
I use the Lahman package together with some 2018 data for this post. Here is my R code for this work.