In my last post, I demonstrated the use of several Shiny apps from the ShinyBaseball package to display patterns of batting and pitching measures over the zone. Since strikeouts are an important aspect of modern MLB pitching, it is interesting to explore the regions where hitters are likely to whiff or miss on a swing. Pitchers are more likely to get whiffs on off-speed pitches, so we’ll focus this discussion on a popular off-speed pitch, the Slider. Here are some questions to motivate this work:
- Where? What regions of the zone are hitters likely to whiff on a slider?
- In and Out of the Zone. Do pitchers vary on their whiff rates on a slider, and how do these whiff rates vary for pitches thrown in and out of the zone?
- How to Measure Effects? How can we measure the location effects of a whiff on a slider, and how can we measure deviations from these general location effects?
Max Scherzer and Kyle Gibson
Let’s focus on two right-handed pitchers, Max Scherzer and Kyle Gibson, both who threw a good number of sliders in the 2019 season. Using my PitchOutcomes() app from the ShinyBaseball package, I construct a graph that shows locations of “swung” sliders and the color of the point corresponds to the swing outcome (foul, in-play or miss). Looking carefully, we notice two interesting things:
- Batters tend to swing at a large number of Gibson sliders out of the zone, and most of these swings are whiffs.
- In contrast, many of the swings on Scherzer sliders tend to be in the zone, and batters appear to miss many of these in-zone sliders.
Slider Miss Rates In and Out of the Zone
This Scherzer/Gibson comparison indicates that there is a distinction between the chance of a miss on a slider thrown in the zone and the chance of a miss on a slider thrown out of the zone. So I collected the pitchers who threw at least 300 swung sliders in the 2019 season and computed two whiff rates for each pitcher — one rate for sliders in the zone and the rate for sliders thrown out of the zone. A scatterplot of these whiff rates is displayed below. Several interesting observations from the graph:
- As one might expect, the chance of missing a slider out of a zone is much larger than the chance of a whiff inside the zone. The out-of-zone miss rates vary between 30 and 70 percent, and the miss rates in the zone vary between 0 to 35 percent. (Pitchers really vary a lot on their whiff rates.)
- There is a weak positive correlation between the in-zone and out-of-zone miss rates. (This is surprising — I would have thought there would be a stronger association between these two whiff rates.)
- To follow up the last point, pitchers such as Gibson and Scherzer are very similar on one miss rate but very different on the other miss rate. Gibson has a high miss rate (close to 70%) for out-of-zone sliders, but only a 17% miss rate for in-zone sliders. In contrast, Scherzer has relatively high miss rates for both out-of-zone (65%) and in-zone sliders (34%).
Use a GAM to Show How Miss Rate Depends on Location
It is pretty clear that a slider’s location is pretty relevant to the miss rates, so let’s fit a model to measure the impact of location. Let p denote the probability of missing a slider thrown by a right-handed pitcher — our GAM (generalized additive) model says that the logit of p is equal to a smooth function of the pitch location (plate_x, plate_z). When we fit this model, we estimate the chance that the batter will miss the slider for any pitch location. To show the fit, I have plotted the pitch locations for a sample of 500 swung sliders and colored the points by the fitted probability of a whiff. As one might expect, these whiff probabilities increase as one moves away from the middle of the zone to lower-right region outside of the zone.
More than Location – A GAM Random Effects Fit
Our earlier work suggested that there was more to throwing an effective slider than just its location. We saw that Max Scherzer was getting batters to whiff on sliders thrown in the zone. We can generalize our GAM model to include both a location effect and an extra term (a random effect) that measures a pitcher’s extra ability to get a whiff on the pitch. We write this model as
logit(p) = s(plate_x, plate_z) + gamma_i
where s() is the smooth function of location and gamma_i is the random effect for the ith pitcher. To complete this model, we assume that the random effects come from a normal distribution with mean 0 and standard deviation sigma. The estimate of sigma is informative about the variability in these pitcher extra whiff abilities.
We fit this model twice — once to right-handed pitchers and once to left-handed pitchers. (We do this since the effective pitch location of a slider is dependent on the pitching arm.). Below we have constructed scatterplots of the overall whiff rate and the random effects for each pitching arm and labeled the points for several pitchers with high whiff rates.
- We see that there is a positive relationship between miss rate and the random effects. So general, high whiff rates correspond to positive values of the random effect, and likewise low whiff rates correspond to negative random effect values.
- Let’s compare Scherzer and Gibson. Although Gibson has a high whiff rate, note that his random effect estimate is close to 0. This indicates that his whiff rates are explained primarily by the location of his sliders. In contrast, Scherzer has one of the largest positive random effect estimates — this shows that his high whiff rate is not just a consequence of pitch location — there is quality to his sliders besides location that is causing the hitters to whiff.
- Other Applications of Random Effects Modeling. This type of random effects model is useful for measuring other types of player performance such as catcher framing and ability to get a hit on a ball put into play. In the Catcher Framing chapter of Analyzing Baseball with R, we fit a random effects model to the probability that a called pitch is a strike. In this model, the chance of a called strike depends on the umpire, the batter and pitcher and a catcher random effect. The random effect estimate measures the catcher framing ability. In this post from a year ago, I used a random effects model to estimate a player’s extra ability to get a hit on a ball in play that is not explained by the launch angle and launch speed measurements off the bat. In this case, a positive random effect estimate corresponds to other quantities such as a player’s speed that impact his BABIP.
- Measurement of Extra Whiff Ability. A positive random effect estimate indicates that a pitcher (like Max Scherzer) has an ability to induce a whiff on a slider above what is explained by the pitch’s location. But we haven’t actually interpreted this “extra ability”. It is likely this random effect measures some characteristic of the slider such as the vertical movement, the pitch speed, or the spin. More work needs to be done to better understand what is causing this extra whiff ability.
- Apply to Other Pitch Types. We focused on sliders here, but certainly this modeling approach would be helpful in understanding a pitcher’s effectiveness for other types of pitches. For a four-seam fastball, I would anticipate that the whiff probability depends primarily on the pitch location and the values of the random effects would be small.
- Got R Code? All of the R work for this exercise can be found on my Github Gist site. There are two files — the file
slider_whiff_work.Ris the main R script and the file
re_offspeed.Ris a general R function for implementing the GAM random effects model for any choice of pitch type and pitching arm. The 2019 Statcast data file is available online as part of my ShinyBaseball package.