I’m currently reading The MVP Machine by Lindbergh and Sawchik that was recently released. This book is an interesting description how data is currently being used to build better baseball players. (I recommended this book to our current president of our statistics association as a great illustration of making decisions from data.) As one might expect, the book highlights particular players that have dramatically changed their hitting or pitching, and statistics are used to substantiate their improvement. In particular, in Chapter 13, they focus on two Indians players — Francisco Lindor and Jose Ramirez who recently became power hitters. For Lindor, they state that his average launch angle for the seasons 2015 through 2018 were respectively 3.8, 7.6, 13.6, and 14.5 degrees. Ramirez similarly had a big increase in his average launch angle from 2015 through 2018. These changes in average launch angle supposedly helped both players to increase their home run productivity over this period.
Anyway, this reading motivated several questions that I will try to address in this post.
- I suppose Lindor and Ramirez were chosen for their large increases in average launch velocity over this Statcast period. What about other players? Can we say that most players are hitting at higher launch angles?
- To get a home run, one also needs a high exit velocity. Is there also a change in the average launch velocity of players over the Statcast period from 2015 to 2018?
- What is the relationship between increase in launch angle, increase in launch speed, and rate of home run hitting in this Statcast period?
We currently have Statcast batted-ball data for the complete 2015 through 2018 seasons and also for the first half of the 2019 season. I am going to focus on changes between the 2015 and 2019 seasons.
One issue, discussed in a recent post, is that roughly 10% of the Statcast data have missing values of launch angle and exit velocity and the missing values are replaced by some “average” values. (The process for doing this is explained on Tom Tango’s site.) Since these imputed values may affect some of the summaries I compute, I identify and remove these cases from the batted ball data for each season.
I’d like to compare the 2015 average launch angle with the 2019 average launch angle in a pairwise fashion — that is, look at the specific players who had a sufficient number of batted balls for both the 2015 and 2019 seasons. I decided to focus on the players who had at least 150 batted balls in 2015 and at least 75 batted balls in the 2019 half-season. The choice of these cutoffs is a bit arbitrary but it won’t change the general conclusions. By the way, there were 147 players in this study and so my work will relate to these “regular” players.
Change in Average Launch Angle
For each of these regular players, I looked at
Mean Launch Angle in 2019 MINUS Mean Launch Angle in 2015
Here’s a histogram of these values with a red vertical line at the value 0. On average, a player mean launch angle has increased by 1.2 degrees between 2015 and 2019 and 60.5 % of the players had an increase in mean launch angle over this period.
Change in Average Launch Velocity
Similarly, we looked at the Mean Launch Speed in 2019 MINUS the Mean Launch Speed in 2015 for each player and a histogram of these differences is below. Here the average difference is 0.27 mph and 57% of the players had a positive difference in mean launch speed over this period.
Change in In-Play Home Run Rate
Of course we know that the overall home run rate has increased dramatically in this period, but we’re looking here at the change at an individual level. Below I graph the increases in home run rate for these 147 players. The average increase in home run rate is 0.009 (almost 1 %) and 64% of the players have displayed improvement.
There is a strong relationship between these three variables — the increase in average launch angle, the increase in average launch speed, and the increase in home run rate between 2015 and 2019. Here I show this by displaying a scatterplot matrix of these variables. The correlation between increase in home run rate and increase in either launch angle or launch velocity is about 0.40. There is a weaker relationship between the change in the mean launch angle and the change in the mean launch speed.
- First, this exercise demonstrates the importance of looking at the data before summarizing it. Here is a table where I show the mean launch angle of Lindor using the complete Statcast data and the “cleaned” Statcast data with the imputed values removed. The differences between these means are sizable. The MVP Machine authors presented mean values of the complete dataset.
- In the MLB Home Run Report, the commission focused more on the carry of the baseball. They showed that characteristics of the baseball changed from 2016 to 2017 and this contributed to the great rise in home run rate. But this work shows that there is another factor that is contributing to this high home run rate — players are hitting at greater launch angles and launch speeds.
- It appears to be hard to get a handle on the baseball effect. The drag coefficients were down in 2017, up again in 2018, and down in the current 2019 season. But I believe that the current home run surge is primarily a batter effect and launch angles and launch velocities will continue to rise.
Someone writes: The problem with the analysis as presented here is that the imputation profile of the data hasn’t been constant over the last few years. The technology has improved, which results in lower missed data/imputation rates over time. Since the radar primarily misses batted balls at the very high & low range of launch angles and exit velocities, when the imputation rate changes the mean values change significantly as well. The proper approach is to impute the missing data and do so in a way better than what MLBAM attempts. Otherwise your conclusions are hopeless distorted by the way the technology has changed.
Response: Thanks for your comments — it helps me to clarify what I actually did and how it impacts summary measures.
- Correct, I don’t know the imputation profile, but I am not assuming that it remains constant through the seasons. My method is simply to flag the (launch angle, launch velocity) pairs that appear unusually frequent compared to other pairs — there should be a smooth distribution of launch angles and deviations from this smooth pattern likely correspond to imputations.
- One concern that you allude to is that the missing values are not missing at random and so I am introducing some bias by removing these imputed cases. I will rerun my analysis with the full dataset — it will be interesting to see how results change.
- Actually, it was pretty easy to rerun my analysis using the complete Statcast dataset. Below I show the sample size (N), median change in average launch angle (LA), proportion with an increase in avg launch angle (LA_p), the same stats for change in avg launch speed (LS and LS_p), and the change in HR stats (HR and HR_p). Clearly the “cleaning” has impacted the stats about the change in launch angle and launch speed, but hasn’t changed the HR stats.