Statcast and Batted Ball Averages
Recently, I have written posts exploring different variables in the Statcast hitting data. Specifically, we have looked at the relationship of exit velocity and launch angle on hitting rates on balls put in play. This week, I thought it might be helpful to use the Retrosheet data to look a bit more deeply how batting averages and its components have changed in the recent history of MLB.
A batting average AVG = H / AB of a player is not that meaningful by itself and I would argue that the phrase “hitting for average” doesn’t say much. One can write a batting average as
AVG = (1 – SO.Rate) BABIP
where SO.Rate = SO / AB is the strikeout rate and BABIP = H / (AB – SO) is the batting average on balls in play. So we start by seeing in the graph below how these three rates AVG, SO.Rate, and BABIP have changed from 2000 through 2017. What I see below is that ..
- the overall AVG has stayed pretty constant through this period (although it is lower in the period 2010 through 2017)
- the strikeout rate has risen substantially from 2005 to 2017
- The BABIP rate has stayed consistent, but it is currently at it’s highest rate over this period
Break Down BABIP by Batted Ball Type
If you work with the Statcast data, you are aware of the importance of launch angle towards getting a hit. So it is reasonable to decompose the BABIP rate as
where is the proportion of batted balls of a specific type (ground ball, fly ball, line drive, or pop out), and P(BABIP |) is the probability of a Hit given that particular type. Let’s see how these proportions have changed over time. First, the graph below plots the proportions of batted balls of different types over time — what do I see?
- The proportions of ground balls and pop ups have stayed pretty consistent over this period.
- Fly balls are more common than line drives until about 2013, and then line drives became more popular? Huh? Personally, I can’t imagine why the fraction of fly balls would suddenly drop off. Remember, we are using Retrosheet data and the classification of batted balls is likely a visual measurement. For some reason, there appears to be some change around 2012 how line drives and fly balls were distinguished. Currently, we have Statcast where line drives and fly balls are precisely determined through the launch angle measurement.
Next we want to see how the probability of a hit given each type of batted ball has changed over seasons — see the figure below. The chance of a hit on a fly ball for earlier seasons was about 0.27, the chance of a hit on a line drive was 0.725, and pop outs are unlikely to be hits. There has been a sudden increase in ground ball hit probabilities in recent seasons. The unusual values for recent seasons, for example the BABIP on fly balls and line drives, is likely a by-product of the inexact way of recording batted ball type. Clearer patterns will likely emerge from the more precise Statcast data.
Batted Ball Rates of Home Runs and Other Hits
My confidence in the Retrosheet batted ball type variable has been shaken by the above two graphs. So let’s return to some thing that I am more sure about. Given the big increase in home run hitting, it seems interesting to look at the historical pattern of the rate of fly balls that are home runs, and the rate of fly balls that are non-HR hits (singles, doubles, triples). We see …
- the home run rate per fly ball was about 12% for many seasons, but the increase from 2015 through 2017 is remarkable — the current rate of 18% is 50% higher than it was just a few years previous.
- the non-HR (single, double, or triple) rate per fly ball was steady in the 15% rate, but that rate has dropped dramatically starting with 2013, but it made a comeback in the 2017 season
- Actually, when I started thinking about this post, I was going to describe some useful measures of player performance based on batted ball type and BABIPs. But I thought it would be helpful initially to get some insight about relevant rates from Retrosheet data.
- It is interesting that although batting averages haven’t, on average, changed much in recent seasons, the components of batting average have gone through significant changes. Strikeout rates are on the rise which would deflate batting averages. But this is compensated in part by the increase in BABIP, and the increase in home run rates have helped to increase BABIP.
- Older Retrosheet data can have errors, especially with details like type of batted ball that might be hard for someone to record. For example, there seemed to be confusion between line drives and fly balls as indicated in one of our graphs.
- I look forward to having more seasons of Statcast data that can provide better insight about the relationship between launch angle and hit probability, and how this relationship will change over time.