In my last post, I began an exploration of Statcast data and examined the relationship between exit velocity, launch angle, and hit probability for specific players. After I wrote that post, I decided to collect more data to get a better understanding of the general pattern. It was relatively easy to construct a graph using the
ggplot2 package — I think this graph is helpful for understanding how exit velocity and launch angle impact batter success.
Here’s what I did:
- I collected Statcast data from Baseball Savant for 20 players, so I have data for 7296 balls put into play.
As before, I used the
gamfunction to fit a generalized additive model where hit/out is the binary response and I am using a logit link. One attractive aspect of these models is that it allows for general functions of the two explanatory variables exit velocity and launch angle.
For all points, I used the fitted model to compute the fitted probability of a hit — these values essentially smooth the data to get a clear understanding how the covariates affect the probability of a hit.
I subdivide the fitted probabilities into a small number of bins and use a coloring scheme to color the different probability intervals.
Here is the graph where I divide the probability range (0, 1) into six intervals.
Let me circle specific portions of the graph to understand what this graph is telling us.
Right away I notice the dark brown region where the probability of a hit is over 5/6. These points with high exit velocity and high launch angles likely correspond to home runs.
At the other extreme, the yellow points correspond to low hit probabilities. These correspond to the two general classes of outs: ground balls on the left and pop-ups and fly-balls on the right.
Other Regions of High Hit Probability
There are two interesting-shape red regions where the estimated probability is between 2/3 and 5/6. The top region corresponds to hard-hit balls — note that the sweet-spot (range of successful launch angles) is wider for balls with exit velocity of 105 mpg than for exit velocity of 95 mpg. The bottom red region corresponds to softly hit balls — are these the lucky or bloop singles that tend to avoid the fielders?
I’ve circled the area of grounders where the launch angle is negative. Note that grounders with launch angles close to zero are more likely to be hits and grounders hit very hard (say around 110 mpg) are likely to be hits for a range of launch angles.
I have modified my
mystatcast package to include the data for 20 players and this new function
discrete_va_plot that constructs this graph.
library(devtools) install_github("bayesball/mystatcast") library(mystatcast) discrete_va_plot(scdata, br=seq(0, 1, length.out=7))
I encourage the interested reader to modify this graph in several ways:
- Try using different breakpoints for the probabilities using the
You are welcome to experiment with different color schemes using the
palargument. Choices for different palettes are described on this ggplot2 help page.
For example, I’m dividing the probability scale into 8 intervals and using the “Greens” palette below.
discrete_va_plot(scdata, br=seq(0, 1, length.out=9), pal="Greens")