Following up Carson’s post, it seems appropriate to give some simple illustrations of using the
pitchRx package to visualize the locations of pitches. Since I’m a Phillies fan, I follow Cliff Lee and I’ll focus on looking at Cliff’s pitches in the 2013 season.
As explained by Carson, it saves a lot of time to initially download a large chunk of Gameday into a database, and then extract portions of the data from the database into R. Using the
src_sqlite function in the
dplyr package and the
scrape function in the
pitchRx package, I set up a database and download Gameday data for all games played in 2013. (This process takes 1-2 hours and the database takes about 600 MB in storage.)
library(dplyr) my_db <- src_sqlite("pitchRx.2013", create = TRUE) library(pitchRx) scrape(start = "2013-01-01", end = "2013-12-01", connect = my_db$con)
Now that the database has all of the 2013 Gameday data, I use functions in the
dplyr package to extract variables from the
atbat tables in the database. The
inner_join function merges the pitch and atbat data, and the
collect function brings the data into R. Note that the
filter argument in
inner_join limits the collection to Cliff Lee pitches.
library(dplyr) my_db <- src_sqlite("pitchRx.2013") locations <- select(tbl(my_db, "pitch"), pitch_type, px, pz, des, num, gameday_link) names <- select(tbl(my_db, "atbat"), pitcher_name, batter_name, num, gameday_link, event, stand) que <- inner_join(locations, filter(names, pitcher_name == "Cliff Lee"), by = c("num", "gameday_link")) pitchfx <- collect(que) #submit query and bring data into R
I only want pitch data for Lee in the 2013 regular season, so I create a
gamedate variable and use the
subset function to focus on games of month numbered 4 or later.
pitchfx$gamedate <- substr(pitchfx$gameday_link, 5, 14 ) pitchfx <- subset(pitchfx, as.numeric(substr(gamedate, 6, 7)) > 3)
What types of pitches did Lee throw? We see below that Lee likes to throw a two-seam fastball (FT), but he also throws a good number of four-seam fastballs (FF), cutters (FC), change-ups (CH), and curve balls (CU).
table(pitchfx$pitch_type) CH CU FC FF FT SL 523 256 666 671 1131 25
To show the location of the pitches, it is helpful to overlay the strike zone in our graphs. Here are the locations of an average zone that Max used in our book and the locations are placed in a data frame called
topKzone <- 3.5 botKzone <- 1.6 inKzone <- -0.95 outKzone <- 0.95 kZone <- data.frame( x=c(inKzone, inKzone, outKzone, outKzone, inKzone), y=c(botKzone, topKzone, topKzone, botKzone, botKzone) )
Here are several interesting graphs using
ggplot2 . The first graph shows the horizontal (variable
px ) and vertical (variable
pz ) locations by batter side.
library(ggplot2) print(ggplot(pitchfx, aes(px, pz, color=stand)) + geom_point() + geom_path(aes(x, y), data=kZone, lwd=2, col="red") + ylim(0, 5) + facet_wrap(~ stand, ncol=1))
Another graph shows the locations of all pitches where each panel corresponds to a different pitch type. We see that Lee tends to throw his off-speed pitches in the lower portion of the strike zone.
print(ggplot(pitchfx, aes(px, pz, color=pitch_type)) + geom_point() + geom_path(aes(x, y), data=kZone, lwd=2, col="red") + ylim(0, 5) + facet_wrap(~ pitch_type))
This is only the beginning of our exploratory analysis of Lee’s pitches. Given the large number of pitches, scatterplots aren’t that informative. The
strikeFX function in the
pitchRx package gives a variety of alternative graphical tools for visualizing this pitch location data.
All of the R code for this post can be found at this gist site.