What does it mean for a baseball hitter to hit “for average”? Here’s an idea how to dissect a batting average and graph its different components, and then use traditional graphics in R to construct this display for a particular player and season of interest.
How does a player gets a base hit? He gets a hit by
- not striking out
- possibly hitting a home run, typically over the fence
- otherwise, he gets a hit on a ball put into play
This motivates the following decomposition of all outcomes of an at-bat (AB). We divide all at-bats first by SO/not-SO, then by HR/not-HR, and then by HIT (in play)/OUT (in play).
If we define the following rates:
- the strikeout rate SO.RATE = SO / AB
- the home run rate HR.RATE = (H – HR) / (AB – SO)
- the balls-in-play batting average BABIP = (H – HR) / (AB – SO – HR)
Then one can show that a batting average AVG = H / AB can be written as
Bill James, in one of his Baseball Abstracts, used the area of a rectangle to represent the runs created for a particular player, where the sides of the rectangle represented on-base and slugging abilities. Likewise, we can use areas of shaded rectangles to represent the different components of a batting average.
Here is an outline of this construction. I’m illustrating this construction using Mark McGwire’s famous 1998 season where he had 509 at-bats, 152 hits, 70 home runs, and 155 strikeouts. (We’ll shortly hear about the number of votes McGwire will get for the HOF.)
* Start with a unit square where the horizontal side corresponds to the strikeout rate (0 to 1).
* Draw off a vertical line at Mark’s strikeout rate of 155 / 509 = 0.304. The area of the rectangle is equal to his strikeout rate
* In the “no strikeout” region, draw a horizontal line at Mark’s home run rate 70 / (509 – 155) = 0.197. The area of this “HR” rectangle
represents the first component of the batting average.
* The area of the upper-right rectangle is the proportion of AB where he did not get a SO or a HR. We mark off the balls-in-play hit rate of (152 – 70) / (509 – 155 – 70) = 0.289 with a vertical line.
The area of the “H” rectangle
is the proportion of AB where Mark got an in-play hit — it represents the second component of the batting average.
* Last, we shade the two type of hits (home runs and in-play) in red and blue, respectively. the sum of the two shaded areas is the batting average:
plot.batting.average will construct this graph using traditional R graphics. All is needed is the
Lahman package that contains the season batting data for all players and the
dpylr package. The inputs to the function are the name of the player in quotes and the season. (If you inspect the function, you’ll see that I use the
plot function to set up the square,
lines functions to draw the segments, and
rect to draw the shaded rectangles.) The graph displays the three rates and gives the areas of the two areas that make up the batting average.
We illustrate this batting average decomposition for several “interesting batters”.
The 1998 Mark McGwire — this is the year when McGwire hit 70 home runs. He had a high strikeout rate of 30%, but he had a remarkable home run rate of 20% and a below-average BIBIP of 29%. (Note that this function is available on my Github Gist site.)
library(devtools) source_gist("6afd88ed3e48fd62b7b6") plot.batting.average("Mark McGwire", 1998)
The 2001 Mark McGwire. This was the last season of McGwire’s career. His strikeout rate rose to 40%, his home run rate dropped to 16%, and his BABIP rate was a measily 18%. This was a distinctive season since the component of his AVG due to home runs was actually higher than the component due to hits in play.
plot.batting.average("Mark McGwire", 2001)
plot.batting.average("Adam Dunn", 2010)
plot.batting.average("Dan Uggla", 2013)
plot.batting.average("Ichiro Suzuki", 2004)
plot.batting.average("George Brett", 1980)
plot.batting.average("Ted Williams", 1941)
These graphs shows graphically how players “hit for average” and might be a useful way to compare batters. At least, it might discourage the imprecise way of simply saying that a player “hits for average”. As indicated by these graphs, players who get a high batting average typically have low strikeout rates, and it is possible to boost one’s batting average by hitting home runs (think Mark McGwire).