Following up Carson’s post, it seems appropriate to give some simple illustrations of using the pitchRx
package to visualize the locations of pitches. Since I’m a Phillies fan, I follow Cliff Lee and I’ll focus on looking at Cliff’s pitches in the 2013 season.
As explained by Carson, it saves a lot of time to initially download a large chunk of Gameday into a database, and then extract portions of the data from the database into R. Using the src_sqlite
function in the dplyr
package and the scrape
function in the pitchRx
package, I set up a database and download Gameday data for all games played in 2013. (This process takes 1-2 hours and the database takes about 600 MB in storage.)
library(dplyr) my_db <- src_sqlite("pitchRx.2013", create = TRUE) library(pitchRx) scrape(start = "2013-01-01", end = "2013-12-01", connect = my_db$con)
Now that the database has all of the 2013 Gameday data, I use functions in the dplyr
package to extract variables from the pitch
and atbat
tables in the database. The inner_join
function merges the pitch and atbat data, and the collect
function brings the data into R. Note that the filter
argument in inner_join
limits the collection to Cliff Lee pitches.
library(dplyr) my_db <- src_sqlite("pitchRx.2013") locations <- select(tbl(my_db, "pitch"), pitch_type, px, pz, des, num, gameday_link) names <- select(tbl(my_db, "atbat"), pitcher_name, batter_name, num, gameday_link, event, stand) que <- inner_join(locations, filter(names, pitcher_name == "Cliff Lee"), by = c("num", "gameday_link")) pitchfx <- collect(que) #submit query and bring data into R
I only want pitch data for Lee in the 2013 regular season, so I create a gamedate
variable and use the subset
function to focus on games of month numbered 4 or later.
pitchfx$gamedate <- substr(pitchfx$gameday_link, 5, 14 ) pitchfx <- subset(pitchfx, as.numeric(substr(gamedate, 6, 7)) > 3)
What types of pitches did Lee throw? We see below that Lee likes to throw a two-seam fastball (FT), but he also throws a good number of four-seam fastballs (FF), cutters (FC), change-ups (CH), and curve balls (CU).
table(pitchfx$pitch_type) CH CU FC FF FT SL 523 256 666 671 1131 25
To show the location of the pitches, it is helpful to overlay the strike zone in our graphs. Here are the locations of an average zone that Max used in our book and the locations are placed in a data frame called kZone
topKzone <- 3.5 botKzone <- 1.6 inKzone <- -0.95 outKzone <- 0.95 kZone <- data.frame( x=c(inKzone, inKzone, outKzone, outKzone, inKzone), y=c(botKzone, topKzone, topKzone, botKzone, botKzone) )
Here are several interesting graphs using ggplot2
. The first graph shows the horizontal (variable px
) and vertical (variable pz
) locations by batter side.
library(ggplot2) print(ggplot(pitchfx, aes(px, pz, color=stand)) + geom_point() + geom_path(aes(x, y), data=kZone, lwd=2, col="red") + ylim(0, 5) + facet_wrap(~ stand, ncol=1))
Another graph shows the locations of all pitches where each panel corresponds to a different pitch type. We see that Lee tends to throw his off-speed pitches in the lower portion of the strike zone.
print(ggplot(pitchfx, aes(px, pz, color=pitch_type)) + geom_point() + geom_path(aes(x, y), data=kZone, lwd=2, col="red") + ylim(0, 5) + facet_wrap(~ pitch_type))
This is only the beginning of our exploratory analysis of Lee’s pitches. Given the large number of pitches, scatterplots aren’t that informative. The strikeFX
function in the pitchRx
package gives a variety of alternative graphical tools for visualizing this pitch location data.
All of the R code for this post can be found at this gist site.
Is this from the catcher’s point of view?
Dan, that is correct. These PitchFX views are from the catcher’s perspective.