Monthly Archives: October, 2022

Plotting Balls & Strikes Effects

Introduction

Anyone who watches a baseball game knows the importance of the balls and strikes count in the duel between the pitcher and the hitter. The plate appearance begins with a 0-0 count. In the next pitch, an added strike gives an advantage to the pitcher; likewise an added ball gives an advantage to the hitter. A hitter has a big advantage on a 3-0 count — as an example, Rhys Hoskins hit a key home run off of Yu Darvish on a 3-0 count in the final game of the recent NL Championship Series. One can quantify the advantage of a particular balls-strikes count by the use of various measures. Recently Tom Tango on Twitter displayed a graph that shows the advantage of particular balls-strikes counts for different hitting measures. Here is the portion of the display for the wOBA measure.

Tom indicated that this shows, given the count (columns indicate the ball count and rows the strike count), how the plate appearance ends. Each added ball increases the batter’s wOBA and each strike decreases it.

This figure is a common way to display balls-strikes effects — in fact, we illustrated the construction of this type of table-figure in Chapter 6 in Analyzing Baseball Data with R. But I think it is hard for a reader to quickly decipher the balls-strikes patterns by this tabular display. In this post, I’ll present an alternative graph of these data which perhaps better (or quicker) communicates what is gained or lost in particular balls-strikes counts. People familiar with this blog have seen similar flavors of these graphs to display different ball-strikes effects.

The Data

I start with the Retrosheet play-by-play data for the 2021 season. The Retrosheet file contains the variable PITCH_SEQ_TX which gives the sequence of balls and strikes (and other outcomes) during a plate appearance. I am interested in two types of count variables:

  • I define indicator (TRUE/FALSE) variables c01, c10, c11, …, c32 that indicate if particular counts, like 0-1, 1-0, 1-1, … occur during the plate appearance. There are 12 different indicator variables including the variable c00 at the start of the PA. I call these “passing through” counts — if for example, passing through count 1-2 means that there was at least one 1-2 count in the specific PA.
  • I record the final count at the end of the plate appearance. For example, for a strikeout, the final count might be 0-2, 1-2, 2-2, or 3-2. There are 12 possible final counts including 0-0.

Next, I compute different measures for each of the possible passing through counts and for each of the final counts. Each of the rates are expressed in terms of percentages. Specifically, I compute

  • (HR) the home run rate 100 x HR / PA
  • (SO) the strikeout rate 100 x SO / PA
  • (BB_HBP) the walk or hit by pitch rate 100 x (BB + HBP) / PA
  • (IP_H) the hit rate on balls in play 100 x H / IP
  • (IP_O) the out rate on balls in play 100 x Outs / IP
  • (wOBA) the wOBA measure using the Fangraphs weights for the 2021 season
  • (1B) the singles rate 100 x IB / PA
  • (XB) the extra-base rate 100 x (2B + 3B + HR) / PA

Two Displays

Using the ggplot2 package, here is a tabular display of the SO rates for the different passing through counts similar to Tom’s display. I color the point regions by the rate value — a white color corresponds to an average SO rate value.

As an alternative, here is my “points display” of the same data. I am plotting the strikeout rate as a function of the pitch number where the label of the point is the count value. This graph dramatically shows the increase in rate for each added strike. Also the rate increase is larger for a change from 1 to 2 strikes than it is for a change from 0 to 1 strikes. I connect the points with lines since many of these counts occur during the same plate appearance. I think this is a better display than the tabular one since one does not need to read the actual digits of the rates in the points display.

Final Count Displays

One also might be interested in the final count for particular batting outcomes. Here is a points display of the SO rates for final counts. Obviously the final count for a strikeout must be two strikes, so many of these strikeout rates are zero. It is interesting the strikeout rate drops substantially when the final count moves to 3-2 compared to a count of 2-2. Here I don’t connect points since different final counts correspond to different plate appearances.

A Shiny App

One can easily check out these different displays by a Shiny app. Currently, this app is live at

https://bayesball.shinyapps.io/MLB_Count_Effects/

One selects the type of count (either passing through or final) and the measure of interest among the eight listed. There are two tabs — the Tabular Display tab shows the tabular graph and the Points Display tab displays the rates plotted against the pitch number. The Download Data button will download a data frame of counts and rates for both count types and all of the measures.

Got Code?

This Shiny app is available as the function BallsStrikesEffects() in my ShinyBaseball package. You don’t need to install the package — the single file app.R found here contains the complete code for the app. There are two plotting functions — construct_plot() and construct_plot2() contain the code for constructing the tabular and points displays described in this post. The user interface component of this app is relatively short — one can use this as a template in the development of another Shiny app.

Try the App Out

Actually, I think both the tabular and points displays are useful. The points display is better for seeing quickly the balls-strikes pattern and the tabular display actually is better for picking up the actual rate values. By playing with the live Shiny app at https://bayesball.shinyapps.io/MLB_Count_Effects/, I think one can learn a lot about the importance of the balls-strikes count in MLB baseball.