Although sabermetricians dismiss the batting average H / AB as a poor measure of batting performance, the media and public still talks about batting average a lot. So I think that instead of dismissing this measure, we should look more carefully what is learned from player’s batting average.
I have a new paper that focused on a breakdown of a batting average. One can represent the batting average BA = H / AB as the following:
where the SO.Rate = SO / AB, HR.Rate = HR / (AB – SO), and the batting average on balls in play rate is BABIP = (H – HR) / (AB – SO – HR). What is interesting is that some of these “component rates” such as HR.Rate and SO.Rate are less affected by chance variation than other rates such as BABIP. Using Bill James’ terminology, a SO.Rate is more persistent than, say a BABIP rate, since the SO.Rate in one season is more predictive of a SO.Rate in the following season.
This observation suggests that we might learn more about the career performance of a player (or compare career performances for several “similar” players) by looking at the career trajectories of these different rates graphed against age. Based on my work with these different rates, I think one would see stronger patterns in a player’s strikeout rates than in a player’s BABIP rates graphed against age.
This thinking motivated me to write a R function to compare players who have similar career batting averages during the same era. I won’t go into the details of the R code here, but the gist site contains some preliminary R work and a special function
compare_rates does the graphing.
We are going to compare component rates for all players who are within
AVG_eps of a target career batting average in at least
Career_target career AB and who had a mid career year (variable
target_year ) within an error of
Career_eps . In this way, we are comparing players with long careers who had approximately the same career batting average during the same baseball era. By default, I set
AVG_eps = .002 ,
AVG_target = .300 ,
Career_target = 3000 , and
Career_eps = 4 , although one can easily change these inputs to get different groups of similar players.
Assuming you have the
Lahman, dplyr, and
ggplot2 packages installed, the following code will source in the code from the gist site and read in the function
Okay, we’re ready to try out the function. I’ll start with finding and comparing players with a lifetime batting average of .300 who played in the vicinity of the 2000 season. As the plots show, the four similar players are A-Rod, Frank Thomas, Kenny Lofton, and Roberto Alomar (these were very different types of hitters, but they all had a career AVG close to .300 during the same baseball era.)
What do we learn from these graphs?
- Generally, A-Rod and Thomas had higher strikeout rates than Lofton and Alomar, although Thomas had a relatively low strikeout rate in the early part of his career. As one might expect, strikeout rates tend to increase towards the end of a player’s career.
- Obviously, A-Rod and Thomas had higher home run rates. A-Rod tended to peak at a young age and Thomas had an interesting bimodal pattern to his trajectory with two peaks. Lofton and Alomar actually had larger home run rates in their early 30’s
- Looking at the hit-in-play rates, one notices more season to season variation (less persistence using James’ terminology) in these rates. This is especially obvious for Lofton and Thomas, although the smooths indicate some general career patterns in these graphs.
For a second example, I am looking at the list of 2016 Hall of Fame nominees. I notice Alan Trammell who had a .285 career batting average and his mid career year was about 1987. This motivates me to use this function with inputs 1987, a target AVG of .285, and I adjusted the value of AVG_eps to get a small group of similar players. As the graph shows, the similar players to Trammell are Lenny Dykstra, Robin Yount, Ryne Sandberg, and Willie Wilson.
compare_rates(1987, AVG_target=.285, AVG_eps=.001)
- I think the strong persistence of strikeout rates for each player is clear, although each player tends to strikeout more towards the end of his career.
- Home run rates for three players show peaks in the 25-30 age range. Clearly Wilson did not hit many home runs in his career
- Comparing with the strikeout rate trajectories, note that the BABIP trajectories show much more noise or up-and-down variation. For example, although Trammell’s career BABIP was about .300, it varied from .260 to .340 across seasons.
One message from this exploration is that players with similar career batting averages actually can be very different in their strikeout, home run, and hit-in-play tendencies. Also one can learn more about peak batting performance by looking at these component rates rather than looking at career trajectories of batting average. Hitters have many dimensions and a single graph is insufficient for learning about their career performances. Last, these career trajectory graphs illustrate the notable chance variability of BABIP and one learns more about a player’s talent from looking at his home run rate or strikeout rate.