If you have been following my blog over the years you know that I like to talk about count effects in baseball. For example, Chapter 6 of the 2nd edition of Analyzing Baseball Data with R is devoted to ball and strike effects and this post presents a graph that shows the value of plate appearances passing through different counts. Since I am working on my CalledStrike package, I thought it would be interesting to use functions from this package to see how various Statcast measures depend on the count. All of these graphs are based on 2018 Statcast data. Also, since these pictures depend on the batter side, I will focus on only right-handed batters. (The pictures for left-handed batters generally are a mirror image of the graphs presented below.)
Before we look at the plots, what is our intuition about count effects?
- Batters like to swing at pitches located towards the middle of the zone for any count.
- Batters tend to be disciplined and want to work the count. That is, they are reluctant to swing at the beginning of the pitch sequence. But they have to swing when there is a two-strike count.
- Batters like to be ahead in the count since they think they will more likely see a good pitch to hit. The quality of any ball put into play is going to be better on batter counts
Hopefully these graphs will reinforce and perhaps illuminate our beliefs about count effects.
A batter has to decide whether to swing at a pitch and this decision depends dramatically on two variables — the pitch location and the count. Below I use filled contour graphs to show how the probability of a swing depends on the count. I think it is helpful to use a 4 x 3 grid to show these graphs so the row shows the ball count (0, 1, 2, 3) and the column shows the strike count (0, 1, 20. A few observations: (1) batters are generally reluctant to swing at early counts, (2) they are very likely to swing at 2-strike counts, and (3) they will swing on batters’ counts (like 2-0 or 3-1) for pitches in the middle of the zone
Now that the batter has decided to swing, how does the probability of missing the pitch depend on the count and location? Here I present contours of the smoothed probabilities of missing the pitch. Remember that we are looking at right-handed hitters. The darker blue region corresponds to the area where the probability of miss is smaller than 20% and this region tends to be on inside-middle or lower-zone pitches. There are subtle count effects here. One takeaway is that the “smaller than 20%” region is smallest on a 3-0 count. The batter sees the 3-0 count as a good opportunity to swing, there is little to lose by swing and missing, and so there is a smaller sweet spot in this case.
Now the batter has put the ball in play. How does the launch speed off the bat depend on the count? The location of the “hot” launch speed region seems pretty consistent across counts. The one takeaway is that the launch speed is greatest during the batter counts 2-0, 3-0 and 3-1. Focusing on the 3-0 count, batters hit low-inside and high-outside pitches the hardest. (By the way, it is interesting that the “low miss” regions for right-handed batters in the previous graphs have a negative orientation, while the “high launch speed” regions here have a positive orientation.)
In-Play Hit and Home Run Rates
Usually one associates high launch speeds with desirable outcomes like hits and home runs. Below I graph the probability of a hit — again remember that we are focusing on right-handed batters. The hot regions here mimic the hot regions in the launch speeds. This particular graph is a little hard to read due to the especially hot regions in the 3-0 count. Also since there isn’t much data for in-play events at 3-0 counts, it is a little hard to make sense of the hot regions. Batters seem to do well with low pitches on 3-0 counts.
To make it easier to compare some of the counts, I focus on the 0, 1, and 2 ball counts below. I don’t seem to see much of a difference between these 9 counts — right-handed hitters tend to hit for average in a region from the lower-inside to upper-outside region of the zone. The hot zone is largest for the batter’s count of 2-0 in this group.
Here is a similar plot for the in-play home run rates, focusing on the 0, 1, and 2 ball counts. The location of the region again seems consistent across counts. The yellow region corresponds to higher home run rates — note that I see larger yellow regions in the 0-0, 1-0, and 2-0 counts — again we see the advantage of batter counts. The contour graph for the 2-1 count looks a bit bizarre.
I’ve attached a snapshot of the R code for these graphs below. I have a csv file on my computer containing all of the 2018 Statcast data (scraped week by week using the baseballr package) and I read that file into the R workspace by the getdata() function. Basically, I am creating a list of data frames corresponding to the 12 count values then using the compare_contour() function using different variables to plot. My CalledStrike package is not yet ready to be submitted to CRAN (maybe it will never be), but I think it is useful for seeing how many of these rates and measures depend on the pitch location.