Monthly Archives: March, 2014

Visualizing Cliff Lee’s Pitches in the 2013 Season

Following up Carson’s post, it seems appropriate to give some simple illustrations of using the pitchRx package to visualize the locations of pitches. Since I’m a Phillies fan, I follow Cliff Lee and I’ll focus on looking at Cliff’s pitches in the 2013 season.

As explained by Carson, it saves a lot of time to initially download a large chunk of Gameday into a database, and then extract portions of the data from the database into R. Using the src_sqlite function in the dplyr package and the scrape function in the pitchRx package, I set up a database and download Gameday data for all games played in 2013. (This process takes 1-2 hours and the database takes about 600 MB in storage.)

my_db <- src_sqlite("pitchRx.2013", create = TRUE)
scrape(start = "2013-01-01", end = "2013-12-01", 
      connect = my_db$con)

Now that the database has all of the 2013 Gameday data, I use functions in the dplyr package to extract variables from the pitch and atbat tables in the database. The inner_join function merges the pitch and atbat data, and the collect function brings the data into R. Note that the filter argument in inner_join limits the collection to Cliff Lee pitches.

my_db <- src_sqlite("pitchRx.2013")
locations <- select(tbl(my_db, "pitch"), 
                pitch_type, px, pz, des, num, gameday_link)
names <- select(tbl(my_db, "atbat"), pitcher_name, batter_name, 
                num, gameday_link, event, stand)
que <- inner_join(locations, filter(names, 
                  pitcher_name == "Cliff Lee"),
                  by = c("num", "gameday_link"))
pitchfx <- collect(que)  #submit query and bring data into R

I only want pitch data for Lee in the 2013 regular season, so I create a gamedate variable and use the subset function to focus on games of month numbered 4 or later.

pitchfx$gamedate <- substr(pitchfx$gameday_link, 5, 14 )
pitchfx <- subset(pitchfx, 
                  as.numeric(substr(gamedate, 6, 7)) > 3)

What types of pitches did Lee throw? We see below that Lee likes to throw a two-seam fastball (FT), but he also throws a good number of four-seam fastballs (FF), cutters (FC), change-ups (CH), and curve balls (CU).


  CH   CU   FC   FF   FT   SL 
 523  256  666  671 1131   25 

To show the location of the pitches, it is helpful to overlay the strike zone in our graphs. Here are the locations of an average zone that Max used in our book and the locations are placed in a data frame called kZone

topKzone <- 3.5
botKzone <- 1.6
inKzone <- -0.95
outKzone <- 0.95
kZone <- data.frame(
  x=c(inKzone, inKzone, outKzone, outKzone, inKzone),
  y=c(botKzone, topKzone, topKzone, botKzone, botKzone)

Here are several interesting graphs using ggplot2 . The first graph shows the horizontal (variable px ) and vertical (variable pz ) locations by batter side.

print(ggplot(pitchfx, aes(px, pz, color=stand)) + geom_point() +
  geom_path(aes(x, y), data=kZone, lwd=2, col="red") +
  ylim(0, 5) + facet_wrap(~ stand, ncol=1))


Another graph shows the locations of all pitches where each panel corresponds to a different pitch type. We see that Lee tends to throw his off-speed pitches in the lower portion of the strike zone.

print(ggplot(pitchfx, aes(px, pz, color=pitch_type)) + geom_point() +
  geom_path(aes(x, y), data=kZone, lwd=2, col="red") +
  ylim(0, 5) + facet_wrap(~ pitch_type))


This is only the beginning of our exploratory analysis of Lee’s pitches. Given the large number of pitches, scatterplots aren’t that informative. The strikeFX function in the pitchRx package gives a variety of alternative graphical tools for visualizing this pitch location data.

All of the R code for this post can be found at this gist site.