Sale on 2nd Edition of ABWR
Today is Cyper Monday and today there is a sale (30% discount) on the 2nd edition of Analyzing Baseball with R that will be available soon. If you are interested, check out the Chapman and Hall book site.
Most baseball fans are aware of the recent surge in home run hitting. In the 2016 season, there were 5610 home runs followed by a big increase to 6105 home runs in the 2017 season. Well, things quickly cooled off — only 5585 home runs were hit in the following 2018 season. That raises some questions about what is going on. Was the baseball juiced in 2017? What impact does the current emphasis about increased launch angles have on home runs? Did the weather play a role in the decrease in home runs in 2018?
With the availability of Statcast data, there are opportunities to explore the pitch by pitch data to help answer questions about home run hitting. Here I am going to focus on a related issue. We know that the distance of a ball put in play depends on the launch velocity and the launch angle. This raises several questions
- What was the relationship between distance and (launch angle, exit velocity) in the 2017 season?
- Has there been a change in this relationship that might account for a drop in home run hitting in the 2018 season?
An Initial Exploration
To get started, I constructed a scatterplot of the launch angle (degrees) and launch speed (mph) for a random sample of 1000 balls in play for the 2017 season. I have colored the point by the distance traveled (feet). Several things are clear from this graph. First, it is important to get the ball in the air (positive launch angle) if a batter wishes to get any distance. Also there is a “sweet spot” of angles between 20 and 40 degrees and launch speeds above 90 mph that seem to result in the longest distances.
To get a better understanding of the relationship of distance with launch angle and launch velocity, I fit a generalized additive model (gam). Basically, it says that distance is a smooth function of the two variables. I fit this model to all balls put in play for the 2017 season. To describe the predictions, I use a contour graph. Below I display contours where the predicted distance is equal to 200, 250, 300, 350, and 400 feet. This graph demonstrates the importance of launch angle — note how the predicted distance increases rapidly as the launch angle changes from 10 to 20 degrees. Also the yellow line brackets the sweet spot where the predicted distance exceeds 400 feet.
Comparing 2017 and 2018 seasons
We wish now to compare the distances of balls put into play for the 2017 and 2018 seasons. We know there were a significant fewer home runs hit in 2018 which suggests that balls didn’t travel as far this season. Since launch angle and launch velocity play an important role in distance traveled, we’d like to control for these variables in this exploration.
Here’s what I did to compare the 2017 and 2018 distances.
- Let’s focus on a particular batter, say Mike Trout. For each of Trout’s batted balls in the 2018 season, I can use the launch angle and exit velocity values to predict the distance traveled using our gam model on the 2017 data. I decided on only considering batted balls where the launch angle was positive, since those are the balls that can be hit for a substantial distance.
- For the 2018 season, Trout had 272 balls hit in the air. For each of those balls in play, I predict the distance traveled using my 2017 gam model. The median distance of these predicted distances was 278.95 feet. The actual median distance of these balls in play was 283 feet. So actually, Trout hit these balls, on average, about 4 feet further in the 2018 season than one would predict based on the 2017 gam model.
- Okay, this is interesting since I initially thought the median distance would be smaller in 2018 compared to 2017 (since we had fewer home runs hit in 2018). But of course Trout is only one batter. I repeated this procedure for all batters in the 2018 season. For the 2018 launch angles and exit velocities, I predict the distances traveled (for balls hit in the air) and compute the median 2018 distance minus the median predicted distance from the 2017 model. Below I graph these differences against the number of batted balls (I only show the points for players who had at least 100 batted balls in 2018.). I use a loess curve to show the pattern and the red horizontal line is at zero. Now this is weird. Obviously there is a lot of scatter (some players hit for higher average distances in 2018 and other players hit for lower average distances), but generally players tend to hit for higher distances, on average, in 2018, although the difference is only about 3 feet.
- Of course, there were a significant fewer count of home runs hit in 2018. In another study, I explored the characteristics of batted balls in the so-called red region (this is the region of launch angle and launch velocity values where it is likely to hit a home run). What I found is that there were more balls hit in this red region in 2018, that is, there were more opportunities to hit home runs in 2018. But the actual proportion of home runs of balls in this red region dropped substantially. This home run study suggests that balls in play did not carry as well in the 2018 season. This brief analysis on distances traveled actually says something different — it suggests that after adjusting for launch angle and launch velocity, balls actually traveled further, on average, in 2018.
- Well, at the least, this work suggests that there is more to be done to understand the reasons for the drop in home run hitting in 2018. I will certainly do more exploration, but I encourage the interested reader to explain the possible inconsistency between increased distance and decreased home run production.