Introduction and Data Source
Baseball Savant provides a “Statcast Search” page where one can query the MLB Statcast Database and get interesting datasets, each that can be downloaded as a csv file and easily imported into R. Using this page, I downloaded four files:
- By using Player Type = Batter, and Pitch Type = Fastballs, I downloaded summary data on fastballs for all hitters up to the last game in the 2017 season
- By using Player Type = Pitcher and Pitch Type = Fastballs, I downloaded summary data for fastballs for all 2017 pitchers
- I repeated the above two queries using Pitch Type = Offspeed, for Batters, and then for Pitchers
I also downloaded Standard Measures for batters and pitchers from Fangraphs.
By merging these datasets, I get two data frames, one for batters and one for pitchers, that I’ll use to explore whiff rates, where a whiff rate = # of whiffs / # of swings. (The relevant variables in the StatCast csv file are
Batter Whiff and K Rates
The first observation is that a batter’s whiff rate is strongly associated with his strikeout rate. Here I construct a plot of whiff rates and K rates. There are four hitters, Chris Davis, Keon Broxton, Miguel Sano, and Joel Gallo with unusually high K rates over 35%. On the other end, Dustin Pedroia has the smallest whiff rate and K rate among hitters who have faced at least 500 pitches.
Pitcher Whiff and K Rates
A similar pattern of association between whiff rate and K rage is true for pitchers. In the scatterplot, five pitchers stand out on the high end and four pitchers are unusually low in their ability to get whiffs and strike out batters.
How Does Whiff Rate Depend on the Pitch (Batter View)?
Okay, what variables are associated with whiff rates? In watching a lot of baseball, it seems that off-speed pitches are often used to strikeout hitters. So I would think that batters would have higher whiff rates on off-speed pitches than fastballs. This motivates the definition of the difference
Difference = Whiff Rate on Offspeed Pitches – Whiff Rate on Fastballs
In the scatterplot below, I plot the batter’s whiff rate against this difference. What do we see?
- As expected, practically all hitters fall above the line Difference = 0, indicating that they do have higher K rates on off-speed pitches.
- But there are two interesting outliers — Scott Schebler and Corey Dickerson — who actually miss a greater fraction of fastballs than off-speed pitches.
- I’ve label two high K guys (Gallo and Broxton) and three high Difference guys Cesar Hernandez, Alex Gordon, and Aaron Hicks. Hernandez, Gordon, and Hicks appear to perform poorly on off-speed (relative to fastballs), although I am just looking at the whiff rates.
How Does Whiff Rate Depend on the Pitch (Pitcher View)?
A similar type of graph can be produced for pitchers. Practically all pitchers (Jake Odorizzi is a notable exception) get higher whiff rates at offspeed pitches than fastballs. Luis Perdomo is notable in that his whiff rate on offspeed pitches is over 0.35 higher than the whiff rate on fastballs. Clayton Kershaw and Corey Kluber have similar difference values, but Kluber is higher than Kershaw on whiff rate. Jacob deGrom has a high whiff rate but he does about the same on offspeed and fastball pitches.
Why Does This Matter?
I imagine that MLB teams have this information about batters and this information could impact strategy — that is, how pitchers will throw to batters. To see if this is generally true, I have graphed the Difference (whiff rate on offspeed pitches minus whiff rate on fastballs) against the proportion of off-speed pitches thrown to the hitter. There is a slight positive association which means that players with a high difference do tend to get a higher fraction of off-speed pitches. But since the association is not that strong, this tells us that other factors might determine the fraction of off-speed pitches thrown. For example, look at Aaron Hicks, Alex Gordon, and Cesar Hernandez. All three hitters have high difference values indicating that they are much less successful at making contact with off-speed pitches. But Hicks tends to get a “high” fraction of off-speed pitches and Hernandez a “low” fraction. I wonder why.
One purpose of this study is to illustrate the relative ease of accessing the Baseball Savant database which is a window to the new Statcast Data. Given the current increase in strikeouts (currently 21.5% of PAs result in strikeouts), it would seem interesting to understand better the cause of this high percentage. A high whiff rate may mean simply that a hitter cannot make contact with a fastball, or it might mean that the hitter is swinging at an unhittable pitch outside of the strike zone. As usual, more can be done with a careful exploration of this data.