Monthly Archives: July, 2015

Exploring Pitches in Cole Hamels’ No-Hitter

As I’m a Phillies fan, I was pretty excited about Cole Hamels’ no-hitter against the Cubs on Saturday. This provides a good excuse to demonstrate the ease of using Carson’s pitchRx package to explore the 129 pitches that Cole threw during this game.

Given that Cole had several rough outings in his recent starts, it is somewhat remarkable that he pitched so well on Saturday. What happened in this particular game? It seems that there are two important factors in a pitching performance — the choice of pitches and the locations where these pitches are thrown. So we’ll focus our exploration on the types of pitches and the locations.

We first scrape the data using the pitchRx package. The list dat contains all of the pitch data for the games played that day. I combine pitch data (including type and location) from the component pitch and bat data from the component atbat . The data frame data contains information for all 129 pitches Hamels threw in this game.

dat <- scrape("2015-07-25", "2015-07-25")
locations <- select(dat$pitch,
                    pitch_type, px, pz, des, num, gameday_link)
names <- select(dat$atbat, pitcher_name, batter_name,
                num, gameday_link, event, stand)
data <- inner_join(locations, filter(names,
                    pitcher_name == "Cole Hamels"),
                   by = c("num", "gameday_link"))

What types of pitches did Cole throw in this game?

with(data, table(pitch_type))
### pitch_type
### 29 26  9 39 26 

We see that Cole threw 39 four-seam fastballs (FF), but he also threw 29 changeups (CH), 26 curveballs (CU), 26 two-seam fastballs (FT), and a few cutters (FC). It seems that Cole may have thrown a greater variation of pitch types than usual.

What were the outcomes of these pitches?

with(data, table(des, pitch_type))
###                            pitch_type
### des                         CH CU FC FF FT
###   Ball                       8  8  1 17 11
###   Ball In Dirt               1  0  0  0  0
###   Called Strike              5  7  1  5  6
###   Foul                       2  1  2  8  5
###   Foul Tip                   0  1  0  0  0
###   In play, out(s)            1  5  2  3  3
###   Swinging Strike           10  4  3  6  1
###   Swinging Strike (Blocked)  2  0  0  0  0

We see some interesting things from this table:

  • A good proportion of the swinging strikes were from changeups.
  • Half of the called strikes were from changeups and curveballs.

What were the locations of these pitches?

Here is easy to use the strikeFX function in the pitchRx package. We add the facet_wrap option so we get a different view of the pitch locations for each pitch type.

strikeFX(data, point.alpha=1, layer=facet_wrap(~pitch_type, ncol=3)) +
  ggtitle("Locations of All Pitches")

We see that most of Cole’s changeups were low and out of the zone and many of his four-seamers were high. Note that his curve balls were all around the strike zone — I’m surprised that Cole threw a no-hitter with so much high breaking pitches.

What were the locations of these pitches where there was a swinging strike?

Using the filter function (from the dplyr package), we limit our exploration to pitches where the outcome (variable des ) included the text “Swing”.

strikeFX(filter(data, substr(des, 1, 5)=="Swing"), point.alpha=1,
         layer=facet_wrap(~pitch_type, ncol=3))+
         ggtitle("Locations of Swinging Strikes")


It seemed that most of these swinging strikes were either low changeups or curveballs or high fastballs.

Cole’s strike zone?

Did Cole benefit with good umpire calls for strikes? We focus on the pitches which were a ball or a called strike. One attractive way of summarizing the locations of these pitches is to fit a generalized additive model to this data where the response is binary (either the pitch is called strike or ball) and the explanatory variables are the horizontal and vertical locations. Using the strikeFX function again, we display the fit from this model — the smoothed values (that is the predicted probabilities of a strike) are displayed using a heat map where a lighter color corresponds to a higher probability of a strike.

noswing <- subset(data, des %in% c("Ball", "Called Strike"))
noswing$strike <- as.numeric(noswing$des %in% "Called Strike")
m2 <- bam(strike ~ s(px, pz),
        data=noswing, family = binomial(link='logit'))
strikeFX(noswing, model=m2) +
  ggtitle("Cole's Strike Zone")


The light blue region where the probability of a strike is high closely matches the strike zone indicating that the umpires are making reasonable calls on Cole’s pitches. The only exception is Cole is getting some benefit in balls located low in the strike zone.

It would be interesting to use pitchFX data to explore how Hamels has changed as a pitcher over his MLB career. My understanding is that Cole relied primarily on a fastball and changeup in his early years, and now he is understanding the benefits of using a wider variety of pitches.