In last week’s post, we were exploring distances traveled on balls put in play (BIP) for the 2017 and 2018 seasons. Since the home run count had significantly dropped in 2018, one would think that there would be a corresponding change in the distances of the BIP between the two seasons. I concluded last week’s post with some evidence that the median distance on BIP (adjusting for exit velocity and launch angle) had actually increased in 2018, which didn’t make sense. Well, I published the result anyway last Monday, since I wanted to mention the Cyper Monday sale of our book.
In this week’s post, I will provide some clarification of this result by exploring the entire distribution of distances of BIP — specially how the distances of BIP have changed for individual players between the two seasons. We’ll see that my comments made last week were a bit misleading since I did. not explore the entire distance distribution.
Histograms of Distances of BIP
I begin by focusing on four players, Mike Trout, Khris Davis, J.D. Martinez, and Joey Votto, and collecting the distances (in feet) of BIP for all balls hit in the air (launch angle is positive) for the 2017 and 2018 seasons using Statcast data. I construct histograms and density estimates of the distances below. Honestly, it is hard to compare the histograms. For example, if you look at Mike Trout, the histograms of distances for the two seasons look a little different — the 2018 distances appear to be more clustered towards large values. But it is hard to make fine comparisons, especially on the large distance values that lead to home runs.
QQ Plots of Distances
One can take a closer look at the differences between two distributions by means of a quantile-quantile or QQ plot. To construct this graph for Mike Trout, we start with a list of probabilities, here values of p from 0.01 to 0.99 in steps of 0.02, then find the quantiles of the 2017 distances and the 2018 distances corresponding to these probabilities. Then we construct a line graph of the quantiles for the 2017 and 2018 data. (So one plotting plot corresponds to the quantiles of the 2017 and 2018 distances for a single probability value.) I add a comparison line y = x — if the line falls above the line, this indicates the 2018 distance is larger than the 2017 distance. Each plotted point corresponds to a specific quantile — I have colored the points in red where the probability is larger than 0.75.
The patterns of these QQ plots are interesting. Look at the QQ plot for Mike Trout. For probabilities between 0.01 and 0.73, the blue line is above the reference line — this means that the 2018 distances are larger than the 2017 distances in the leftmost part of the distribution. But the red points are right on top of the reference line for high probability values — this means that the right-tail of the BIP distances is very similar between the two seasons. (The right-tail is relevant since those distances are often home runs.) If you look at J.D. Martinez, note that the 2017 and 2018 distances are similar for small probability values, but his 2018 distances were significantly shorter in the right-tail.
In each of the four cases, I am seeing the same thing. The 2018 advantage in distance appears to be smaller in the right-tail than it is in the middle and lower parts of the distribution.
Comparing Quantiles of Distance
To see if this effect is true for all players, I collect the distances of BIP for all players who had at least 200 BIP for both 2017 and 2018 seasons. For each of the probabilities 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, I compute for each player
Difference = Quantile of 2018 Differences MINUS
Quantile of 2017 Differences
Below I have constructed histograms of these quantile differences for all players. If you look at the difference in medians (labeled by P50_diff), it appears that the histogram is pretty symmetric about the reference line of 0. On average, the distances in BIP were similar between the two seasons. In contrast, if one moves towards the right tail of the distribution (looking down the graph for larger probability values), we come to a different conclusion. The histograms are centered about negative values. That is, as we move further towards the right tail, the quantiles appear to be smaller for the 2018 season.
I can confirm these observations by computing a few summaries. For each collection of difference in quantiles, I compute the median and the fraction of differences that are positive. The differences in the medians (labeled P50_diff in the table) are, on average, equal to 2 feet and 56.9% of the players had positive differences. So, on the average, the players did hit BIP further in 2018 compared to 2017 (this is what I concluded last week). But one reaches a different conclusion in the right tail, if one compares the 95th percentiles (corresponding to home run land), then the average difference in quantiles is MINUS 5 feet and only 29.2% of the players had a positive difference in quantiles.
- Comparing batches can be tricky. Suppose that a particular player takes a vitamin that makes him stronger in the 2018 season. Then one might think that the player’s 2017 distribution of distances might increase by a particular amount, say 2 feet, in the 2018 season. This is a simple comparison that says that the 2018 distribution differs from the 2017 distribution by a constant amount. What is happening here is not that simple. In the left and middle parts of the distance distribution, hitters tend to hit further in the 2018 season. In contrast, hitters tend to hit shorter in the 2018 season in the right tail of the distribution which might explain the decrease in the home run count.
- Launch angle and exit velocity? This work is ignoring the effect of a change in launch angle or exit velocity which affects the distances of batted balls. It is possible that changes in these off-the-bat measures might account for the decreases in distances in the right tail that we are observing above. As a next step, I plan to compare the 2017 and 2018 distances, adjusting for launch angle and exit velocity. My gut feeling is that the basic conclusions won’t change — in 2018 there appears to be a decrease in distance traveled in the right tail of the distribution that corresponds to the home runs.
- Got code? As usual, all of the R code for this work is available on my GistGithub site.