As I’m a Phillies fan, I was pretty excited about Cole Hamels’ no-hitter against the Cubs on Saturday. This provides a good excuse to demonstrate the ease of using Carson’s
pitchRx package to explore the 129 pitches that Cole threw during this game.
Given that Cole had several rough outings in his recent starts, it is somewhat remarkable that he pitched so well on Saturday. What happened in this particular game? It seems that there are two important factors in a pitching performance — the choice of pitches and the locations where these pitches are thrown. So we’ll focus our exploration on the types of pitches and the locations.
We first scrape the data using the
pitchRx package. The list
dat contains all of the pitch data for the games played that day. I combine pitch data (including type and location) from the component
pitch and bat data from the component
atbat . The data frame
data contains information for all 129 pitches Hamels threw in this game.
library(pitchRx) library(dplyr) dat <- scrape("2015-07-25", "2015-07-25") locations <- select(dat$pitch, pitch_type, px, pz, des, num, gameday_link) names <- select(dat$atbat, pitcher_name, batter_name, num, gameday_link, event, stand) data <- inner_join(locations, filter(names, pitcher_name == "Cole Hamels"), by = c("num", "gameday_link"))
What types of pitches did Cole throw in this game?
### pitch_type ### CH CU FC FF FT ### 29 26 9 39 26
We see that Cole threw 39 four-seam fastballs (FF), but he also threw 29 changeups (CH), 26 curveballs (CU), 26 two-seam fastballs (FT), and a few cutters (FC). It seems that Cole may have thrown a greater variation of pitch types than usual.
What were the outcomes of these pitches?
with(data, table(des, pitch_type))
### pitch_type ### des CH CU FC FF FT ### Ball 8 8 1 17 11 ### Ball In Dirt 1 0 0 0 0 ### Called Strike 5 7 1 5 6 ### Foul 2 1 2 8 5 ### Foul Tip 0 1 0 0 0 ### In play, out(s) 1 5 2 3 3 ### Swinging Strike 10 4 3 6 1 ### Swinging Strike (Blocked) 2 0 0 0 0
We see some interesting things from this table:
- A good proportion of the swinging strikes were from changeups.
- Half of the called strikes were from changeups and curveballs.
What were the locations of these pitches?
Here is easy to use the
strikeFX function in the
pitchRx package. We add the
facet_wrap option so we get a different view of the pitch locations for each pitch type.
strikeFX(data, point.alpha=1, layer=facet_wrap(~pitch_type, ncol=3)) + ggtitle("Locations of All Pitches")
We see that most of Cole’s changeups were low and out of the zone and many of his four-seamers were high. Note that his curve balls were all around the strike zone — I’m surprised that Cole threw a no-hitter with so much high breaking pitches.
What were the locations of these pitches where there was a swinging strike?
filter function (from the
dplyr package), we limit our exploration to pitches where the outcome (variable
des ) included the text “Swing”.
strikeFX(filter(data, substr(des, 1, 5)=="Swing"), point.alpha=1, layer=facet_wrap(~pitch_type, ncol=3))+ ggtitle("Locations of Swinging Strikes")
It seemed that most of these swinging strikes were either low changeups or curveballs or high fastballs.
Cole’s strike zone?
Did Cole benefit with good umpire calls for strikes? We focus on the pitches which were a ball or a called strike. One attractive way of summarizing the locations of these pitches is to fit a generalized additive model to this data where the response is binary (either the pitch is called strike or ball) and the explanatory variables are the horizontal and vertical locations. Using the
strikeFX function again, we display the fit from this model — the smoothed values (that is the predicted probabilities of a strike) are displayed using a heat map where a lighter color corresponds to a higher probability of a strike.
library(mgcv) noswing <- subset(data, des %in% c("Ball", "Called Strike")) noswing$strike <- as.numeric(noswing$des %in% "Called Strike") m2 <- bam(strike ~ s(px, pz), data=noswing, family = binomial(link='logit')) strikeFX(noswing, model=m2) + ggtitle("Cole's Strike Zone")
The light blue region where the probability of a strike is high closely matches the strike zone indicating that the umpires are making reasonable calls on Cole’s pitches. The only exception is Cole is getting some benefit in balls located low in the strike zone.
It would be interesting to use pitchFX data to explore how Hamels has changed as a pitcher over his MLB career. My understanding is that Cole relied primarily on a fastball and changeup in his early years, and now he is understanding the benefits of using a wider variety of pitches.