Suppose a player has a 300 (or higher) batting average midway through a season — what is the chance that he finishes the season with a 300+ batting average?
We’ll answer this question empirically and use the answer to motivate a decomposition of a batting average.
I collect all Retrosheet play-by-play data for each of the 1960-2015 seasons. I focused on the players who had at least 100 AB and batting average .300 (or higher) on July 1– there were 2619 players in this group (about 47 players each season).
Of these 2619 players, 1401 of them (53%) finished with a .300+ batting average. So one could say that the probability a 300+ midseason batter finishes with a 300+ average is .53. Given what we know at midseason, we would predict that it is slightly more probable than not that he finishes 300 or higher.
But one can make a more accurate prediction by looking at two components of a batting average. One can write
where SO.RATE is the strikeout rate and BABIP is the proportion of hits (including HR) in balls put in play. For each of these players who achieved a 300+ midseason average, I construct a scatterplot of “One Minus the SO.RATE” against the BABIP. The contour lines correspond to constant values of AVG. Of course, all of the points for our selected players fall on or above the contour line corresponding to .300. Note that there is a lot of variability in the strikeout rates, although they all hit well “for average”.
The following graph compares the midseason and final averages of these players. As expected the cloud of points above .300 at midseason moves download to a cloud of points at the end of season where the points straddle the .300 line.
Look carefully at the bottom graph. Look at the points corresponding to players with a low value of “One Minus Strikeout Rate” (high strikeout rate) on the left side of the graph. Most of these players had a final season batting average under .300. In contrast, if you look at the points on the right (low strikeout rate), they appear to be more likely to have a final AVG over .300.
We can make this clearer by the use of a graph. For many values of One Minus Strikeout Rate (plus and minus a small neighborhood), I compute the proportion of hitters with a final batting average exceeding .300. (Remember overall that 53% of these hitters finish with a 300+ AVG.)
We see a strong relationship between One Minus Strikeout Rate and the chance he has a final season AVG over .300. If a player has, say a 25% strikeout rate (One Minus SO.RATE = .75), then his chance of a final season 300+ AVG is about .43. If instead, his strikeout rate is 10%, his chance of a final 300+ average is .575. Another way of saying this is that the high BABIP guys tend not to get a final 300+ AVG.
The message here is that all 300 batting averages are not the same. From a skill / luck perspective, there is more skill in a strikeout rate than in a BABIP rate and that translates into different future batting averages for players with different SO.RATE / BABIP characteristics. One can also reach this same conclusion from a modeling perspective and I’ll give a talk on this issue at the SABR Analytics meeting.