As I’m a Phillies fan, I was pretty excited about Cole Hamels’ no-hitter against the Cubs on Saturday. This provides a good excuse to demonstrate the ease of using Carson’s ` pitchRx `

package to explore the 129 pitches that Cole threw during this game.

Given that Cole had several rough outings in his recent starts, it is somewhat remarkable that he pitched so well on Saturday. What happened in this particular game? It seems that there are two important factors in a pitching performance — the choice of pitches and the locations where these pitches are thrown. So we’ll focus our exploration on the types of pitches and the locations.

We first scrape the data using the ` pitchRx `

package. The list ` dat `

contains all of the pitch data for the games played that day. I combine pitch data (including type and location) from the component ` pitch `

and bat data from the component ` atbat `

. The data frame ` data `

contains information for all 129 pitches Hamels threw in this game.

library(pitchRx) library(dplyr) dat <- scrape("2015-07-25", "2015-07-25") locations <- select(dat$pitch, pitch_type, px, pz, des, num, gameday_link) names <- select(dat$atbat, pitcher_name, batter_name, num, gameday_link, event, stand) data <- inner_join(locations, filter(names, pitcher_name == "Cole Hamels"), by = c("num", "gameday_link"))

**What types of pitches did Cole throw in this game?**

with(data, table(pitch_type))

### pitch_type ### CH CU FC FF FT ### 29 26 9 39 26

We see that Cole threw 39 four-seam fastballs (FF), but he also threw 29 changeups (CH), 26 curveballs (CU), 26 two-seam fastballs (FT), and a few cutters (FC). It seems that Cole may have thrown a greater variation of pitch types than usual.

**What were the outcomes of these pitches?**

with(data, table(des, pitch_type))

### pitch_type ### des CH CU FC FF FT ### Ball 8 8 1 17 11 ### Ball In Dirt 1 0 0 0 0 ### Called Strike 5 7 1 5 6 ### Foul 2 1 2 8 5 ### Foul Tip 0 1 0 0 0 ### In play, out(s) 1 5 2 3 3 ### Swinging Strike 10 4 3 6 1 ### Swinging Strike (Blocked) 2 0 0 0 0

We see some interesting things from this table:

- A good proportion of the swinging strikes were from changeups.
- Half of the called strikes were from changeups and curveballs.

**What were the locations of these pitches?**

Here is easy to use the ` strikeFX `

function in the ` pitchRx `

package. We add the ` facet_wrap `

option so we get a different view of the pitch locations for each pitch type.

strikeFX(data, point.alpha=1, layer=facet_wrap(~pitch_type, ncol=3)) + ggtitle("Locations of All Pitches")

We see that most of Cole’s changeups were low and out of the zone and many of his four-seamers were high. Note that his curve balls were all around the strike zone — I’m surprised that Cole threw a no-hitter with so much high breaking pitches.

**What were the locations of these pitches where there was a swinging strike?**

Using the ` filter `

function (from the ` dplyr `

package), we limit our exploration to pitches where the outcome (variable ` des `

) included the text “Swing”.

strikeFX(filter(data, substr(des, 1, 5)=="Swing"), point.alpha=1, layer=facet_wrap(~pitch_type, ncol=3))+ ggtitle("Locations of Swinging Strikes")

It seemed that most of these swinging strikes were either low changeups or curveballs or high fastballs.

**Cole’s strike zone?**

Did Cole benefit with good umpire calls for strikes? We focus on the pitches which were a ball or a called strike. One attractive way of summarizing the locations of these pitches is to fit a generalized additive model to this data where the response is binary (either the pitch is called strike or ball) and the explanatory variables are the horizontal and vertical locations. Using the ` strikeFX `

function again, we display the fit from this model — the smoothed values (that is the predicted probabilities of a strike) are displayed using a heat map where a lighter color corresponds to a higher probability of a strike.

library(mgcv) noswing <- subset(data, des %in% c("Ball", "Called Strike")) noswing$strike <- as.numeric(noswing$des %in% "Called Strike") m2 <- bam(strike ~ s(px, pz), data=noswing, family = binomial(link='logit')) strikeFX(noswing, model=m2) + ggtitle("Cole's Strike Zone")

The light blue region where the probability of a strike is high closely matches the strike zone indicating that the umpires are making reasonable calls on Cole’s pitches. The only exception is Cole is getting some benefit in balls located low in the strike zone.

It would be interesting to use pitchFX data to explore how Hamels has changed as a pitcher over his MLB career. My understanding is that Cole relied primarily on a fastball and changeup in his early years, and now he is understanding the benefits of using a wider variety of pitches.

Jim,

with regards to the last graph titled “Cole’s Strike Zone,” is there a way to change the color scale to be multiple colors (say red and green)?

I’ve done this for larger samples and it’s difficult to visualize where exactly that 50% theoretical threshold is in a single color scheme. I’ve never been able to figure out how to change colors to a non-single color scheme.

Kevin:

I asked Carson and he says that any of the scale_fill_gradient*() functions to change the color scale.

Here’s an example:

library(pitchRx)

noswing <- subset(pitches, des %in% c("Ball", "Called Strike"))

noswing$strike <- as.numeric(noswing$des %in% "Called Strike")

library(mgcv)

m1 <- bam(strike ~ s(px, pz, by=factor(stand)) +

factor(stand), data=noswing, family = binomial(link='logit'))

# geom will automatically be set to 'raster'

strikeFX(noswing, model=m1, layer=facet_grid(.~stand)) +

scale_fill_gradientn(colours = rainbow(6))

Jim

One alternative if you want full control over colors used, I go through how to build up these graphics on your own using filled.contour (though, it won’t be through pitchRx or ggplot2). But hopefully instructive on the models behind the figures:

https://baseballwithr.wordpress.com/2015/06/30/houston-astros-whiffs-and-exit-velocity/