In last week’s post, we began an exploration of whiff rates on sliders. We looked at pitcher whiff rates for pitches thrown in and out of the zone. In the 2019 season, we saw that Max Scherzer and Kyle Gibson had similar whiff rates for balls out of the zone, but Scherzer had a much higher whiff rate than Gibson in the zone. That motivated the fitting of a model where we represented the probability of a miss as a smooth function of the pitch location. By adding random pitcher effects to the model, we showed that some pitchers such as Max Scherzer have abilities to induce whiffs on slider beyond what are predicted by the pitch locations.
That work motivated some new questions:
- Since random effects estimates are a bit abstract, is there a more basic way to demonstrate these “beyond location” abilities to induce whiffs on sliders?
- How does this work carry over to other pitch types such as four-seamers or change-ups?
- Can we identify remarkable pitchers who are unusually good (and unusually weak) in inducing whiffs on swung pitches?
In this post, I illustrate the use of standardized scores to measure a pitcher’s whiff rate performance beyond one would predict on the basis of the pitch locations.
To begin, using 2019 Statcast data, I collected all of the pitches thrown of a particular type, say Slider, and a particular pitching side, say Right. I focused on the pitches where the batter swings and defined a variable Miss which is equal to 1 if the batter whiffed and 0 otherwise. I fitted the generalized additive model
where p is the probability of a Miss, (plate_x, plate_z) gives the coordinates of the pitch location, and s() represents a smooth function. Using this model, one can predict the probability of a whiff for any pitch location.
For each pitcher, I computed the observed number of swing misses M. In addition, I computed an Expected number of misses E for that pitcher by adding up the whiff probabilities on the pitch locations where the batter swings. To assess the difference between M and E, I compute a Z-score
Z-scores are attractive since they adjust the increase in number of misses, M – E, for the sample size and have a clear interpretation. If the miss probabilities are primarily explained by the pitch locations, we anticipate that most of the Z-scores will fall between -2 and 2. If a pitcher has a “large-positive” Z-score (much larger than 2), this positive residual indicates that he has an “extra” ability to induce a whiff on this particular pitch. On the other hand, a “large-negative” Z-score means that this pitcher does worse in whiffing batters than one would predict based on his pitch locations. To implement this calculation, I have a special R function
pitch_z_score() that has three inputs — the Statcast dataset, the pitch type, and the pitching arm. The output includes a data frame containing the number of pitches, the number of swings, the number of whiffs, and the Z-scores for all pitchers that season who threw that particular pitch. By graphing the Z-scores against the number of pitches, we can see general patterns and the remarkable values (high and low) that deviate from the general pattern.
Z-Scores for Four Pitch Types
For each pitching arm, I decided to look at four pitch types — sliders (Statcast code SL), four-seamers (FF), two-seamers (FT), and change-ups (CH) — and compute the Z-scores for all pitchers who threw that particular pitch in the 2019 season. Below I am plotting the number of pitches against the Z-scores for each pitch type. The red dotted lines correspond to Z-score values of -2.5 and 2.5 — points either above or below the red lines correspond to pitchers with extreme whiff rates. The points on the right-hand side of each display correspond to starters who throw a large number of pitches of that type. What do we take away from this graph for right-handed pitchers?
- Most of the points fall within the (-2.5, 2.5) band which indicates that pitch location explains the whiff rates for many of the pitchers. All of the Z-scores for Sliders (SL) fall between -3 and 3.
- But there are a number of Z-scores above the “2.5 line” indicating that there are pitchers whose whiff rate is higher than one would predict based on the pitch locations. This is most obvious for change-ups (CH) and four-seam fastballs (FF).
Here is the graph for the 2019 southpaws. What really stands out are the large Z-scores for pitchers throwing the four-seamer.
Who are these extreme pitchers with large Z-scores? For the 8 pitchers with Z-scores exceeding 5, the following tables display, for each pitcher, the number of swings, number of misses and the Z-scores, first for right-handers and then for left-handers. Note that six of these pitchers have outstanding four-seamers and two have remarkable change-ups. Josh Harder had the largest Z-score among pitchers across these four pitch types. Of the 566 four-seamers thrown by Hader, 214 are missed (38%). Based on the location model, we expect the miss count to be 129, and the corresponding Z-score would be
What is special is that Hader’s whiff rate of 38% is remarkably high after adjusting for the pitch locations. There was a 538.com article a few years back that referred to Josh Hader’s fastball as baseball’s most mysterious pitch.
Z-Scores are Measurements of Pitcher Ability
To demonstrate that these Z-scores are meaningful measurements of pitcher ability, we run this exercise for two consecutive seasons (here 2018 and 2019) and see if the Z-scores from one season are associated with the Z-scores for the second season. I’ve done this by focusing on right-handed pitchers who throw four-seamers. The scatterplot below demonstrates that pitchers who are above-average for one season are very likely to be above-average in the second season.
- Similar purpose as random effects. In last week’s post, I used a random effects model to measure a pitcher’s whiff performance that adjusted for pitch location. This Z-score method has a similar purpose, but it is perhaps simpler to explain — for example, a value of Z = 2 means that a pitcher’s count of whiffs is two standard deviation units higher than one would predict on the basis of the pitch locations.
- Pitch location isn’t the only explanation for best pitches. Recently, Zach Wheeler was quoted that he has to improve his slider by improving the pitch locations. Okay, but this exercise demonstrates that there is more to understanding whiff rates than just considering pitch location.
- What’s behind these Z-scores? As I mentioned in last week’s post, I haven’t explain why pitchers’ adjusted whiff rates can be so low or high. Batters whiff at pitches for many possible reasons — the pitcher’s delivery might be hard to see, the batter might be expecting a fastball instead of a changeup, and so on. I think there are opportunities to learn in future study why pitchers are effective or not effective. Simple explanations like pitch velocity don’t appear to answer this question.
- General method for assessing extreme counts. If you have been reading my blog, you’ll know that these (Pearson) z-scores are included among my favorite tools in my statistician’s toolkit. They are generally helpful in assessing how observed counts deviate from expected counts from a fitted model. For example, they can be used to assess a catcher’s framing ability to “steal additional strikes”, or can be used to measure a hitter’s ability to get additional hits beyond what is predicted by launch variables.