As I mentioned in a previous post, I have been working on a new package CalledStrike that facilitates the construction of heat and contour maps for different baseball measures. I am currently working with a research group of ACTION students at BGSU and they generally seem to be successful in using this package and producing some interesting plots. In this post, I’ll explain how to use this package and illustrate some of the new comparison graph functions.
To get started, you need to install the ggplot2, dplyr, baseballr, and mgcv packages including their dependencies. The baseballr package is used to scrape the Statcast data from Baseball Savant and the mgcv package is used to produce the smooth generalized additive model (gam) fits. Then you install the CalledStrike package using the github_install() function.
Collect Some Data
First you need some data to explore. The get_id() function uses the baseballr package to find the MLBAM id for a player. The get_sc_data() function will scrape the Baseball Savant data for a particular hitter (or pitcher) for a particular season (2015 though 2018). For this example, I collect Mike Trout data for the 2018 season.
Called Strike Graphs
Once you have data, the called_strike_plot() function will construct a heat map for the probability of a called strike. The only arguments to this function are the data frame of interest and a title for the graph. Trout (a right-handed hitter) is located to the left of the zone — he tends to get favorable strike calls on the inside edge of the zone but unfavorable calls on the outside edge.
Miss Swing Rate Graphs
Next the miss_swing_plot() function displays a heat map of the probability of missing a pitch on a swing. Trout tends to miss very low pitches and pitches high and away.
In-play Home Run and Hit Rates
For balls put in play, the home_run_plot() and hit_plot() show heat maps of the probability of a home run and hit, respectively. Below we see that the 2018 Trout has a clear sweet spot with respect to home run hitting.
For those that prefer contour graphs, there are several functions for constructing contour plots of a proportion fit or a mean fit. Below I show a contour plot of Trout’s in-play hit rates where contour lines are drawn at 0.300, 0.400, and 0.500. He is especially lethal in pitches in the middle and low in the zone.
Statcast Variable Graphs
Also there are simple functions la_plot(), ls_plot(), and sa_plot() that construct heat maps for the Statcast variables launch angle, launch speed, and spray angle. Below I show the heat maps of Trout’s spray angles. As one might expect, Trout pulls pitches located low and inside and hits high and outside pitches to the opposite field.
Although these graphs are fine, one typically wants to make comparisons. Here are some examples:
- How do two different hitters compare with respect to some measure?
- How does the 2017 Mike Trout compare to the 2018 Mike Trout?
- How does Trout hit against lefties compared to right-arm pitchers?
- How does Trout hit against off-speed pitches compared to fastballs?
- How does Trout hit when he is ahead in the count versus behind in the count?
- If we have a switch hitter like Frankie Lindor, how is his left-sided hitting compare to the right-sided hitting?
So I wrote several functions that facilitate comparisons via heat maps.
Here I use a function compare_ip() to compare Trout’s 2017 and 2018 home run rates. It seems that Trout had a larger home run sweet spot in 2017.
Sometimes, we want to compare four players or one player for four different seasons. The function compare_ip4() constructs heat maps for the four data frames on the same scale. For example, here are Trout’s in-play hit probabilities for four seasons. There is a lot of similarity in these graphs, but it seems that Trout prefers lower pitches in the last two seasons.
- To get started, you just need to install the CalledStrike package from my Github site. A Markdown file with the examples in this post can be found in the vignettes folder, and the knitted html file can be found here . I will continue to make improvements to this package. Actually one reason why I wrote this package was to organize my own R work. All R users should get some experience writing packages.
- If you are more interested in the details of model fitting and graphing, look at some of the fitting and graphing functions in the package. For example, the function ls_gam_fit() will fit a generalized additive model lm ~ s(plate_x, plate_z) where s() is a smooth function. The function tile_plot_m() constructs a heat map of a continuous response as a function of the plate_x and plate_z variables.