In recent posts, we considered the value of plate appearances that passed through specific counts. For example, PA’s that pass through a 0-2 count are clearly worse (from an average runs perspective) than a 2-0 count. Here we look at the transitions between pitch counts and illustrate the use of a R package to graph the probabilities of different transitions.

A useful way of modeling these transitions is by a Markov Chain. A plate appearance starts with a 0-0 count and there are 11 possible pitch counts (0-1, 1-0, 0-2, 1-1, 2-0, 1-2, 2-1, 3-0, 2-2, 3-1, 3-2) and we’ll call the end of the plate appearance (a walk, strikeout, or a ball put in play) as the “final state”. We call these possible pitch counts (including the 0-0 count) and the end of the PA the “states” and the Markov Chain describes the probabilities of moving among these states in each pitch. (One basic assumption of a Markov Chain is the probability of moving to a new state just depends on the current state and this history of the movement to the current state is not relevant.) Using the 2015 Retrosheet data, we can estimate these “transition probabilities” pretty accurately.

Here is the 13 x 13 matrix that gives these transition probabilities. Starting at the 0-0 count, the first row says that the after one pitch, the new state will be either 0-1, 1-0, or X (in play) with respective probabilities .50, .39, and .11. In contrast, if the current state is 3-2, then the new state (after one pitch) will be 3-2 and X (in play, SO, or BB) with probabilities .29 and .71.

0-0 0-1 1-0 0-2 1-1 2-0 1-2 2-1 3-0 2-2 3-1 3-2 X 0-0 0 0.5 0.39 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0-1 0 0.0 0.00 0.41 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 1-0 0 0.0 0.00 0.00 0.48 0.34 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0-2 0 0.0 0.00 0.19 0.00 0.00 0.45 0.00 0.00 0.00 0.00 0.00 0.37 1-1 0 0.0 0.00 0.00 0.00 0.00 0.44 0.34 0.00 0.00 0.00 0.00 0.22 2-0 0 0.0 0.00 0.00 0.00 0.00 0.00 0.49 0.33 0.00 0.00 0.00 0.18 1-2 0 0.0 0.00 0.00 0.00 0.00 0.21 0.00 0.00 0.37 0.00 0.00 0.42 2-1 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.46 0.28 0.00 0.25 3-0 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.56 0.00 0.44 2-2 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.29 0.46 3-1 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.47 0.53 3-2 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 0.71 X 0 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00

I recently discovered a package ` markovchain `

that is dedicated to manipulations and plots of discrete Markov chains. I have not looked at this package carefully, but given this transition probability matrix, one can create a Markov Chain object and the ` plot `

method constructs an attractive graphical display of these probabilities.

Here is the graph for our pitch count Markov Chain.

To make sense of this plot …

- A batter starts in the lower left part of the graph (at 0-0) and moves through pitch counts with more balls and strikes.
- At any count, it is possible to move to the X state (end of the PA) which is represented by the circle in the middle of the display.
- For early pitch counts, there are three movements — add another strike, add another ball or in-play — and they have respective probabilities given by the numbers along the paths.
- For counts with two strikes, note that there positive probabilities of remaining at the same count.
- It is interesting to how the probability of adding a strike to the count depends on the current count. Â Likewise, the probability of adding a ball to the count changes depending on the current count.

There are likely improvements to this display, but it seems more effective than the table in displaying the transition probabilities.

The ` markovchain `

packages simplifies some Markov Chain calculations. For example, to find the probabilities of being in different states after two pitches, one constructs an initial probability vector that says that one begins in a 0-0 count, and then multiplies this initial probability vector (saved in ` initialState `

) by the transition probability matrix (saved in ` dmc `

) twice.

initialState <- c(1, rep(0, 12)) round(initialState * dmc ^ 2, 2) 0-0 0-1 1-0 0-2 1-1 2-0 1-2 2-1 3-0 2-2 3-1 3-2 X [1,] 0 0 0 0.2 0.39 0.13 0 0 0 0 0 0 0.27

We see that it is most likely (after 2 pitches) to be in a 1-1 count, and there is a 27% chance after two pitches that the PA is over.

After five pitches, we see below that there is a 81% chance that the PA is over, and a 9% chance there we are in a 3-2 count.

round(initialState * dmc ^ 5, 2) 0-0 0-1 1-0 0-2 1-1 2-0 1-2 2-1 3-0 2-2 3-1 3-2 X [1,] 0 0 0 0 0 0 0.02 0 0 0.07 0 0.09 0.81

I’ll look at this package more and may post more interesting illustrations in future posts.

How did you find the transition probabilities for the various counts in the large matrix?

Mychelle: Using the Retrosheet play-by-play data, I first created a string which contains the ball and strike sequence. Then I wrote a function which took a particular string like “bbssb” and output vectors of beginning and ending counts. The function is a bit clunky — it may be easier to create these vectors using pitchFX data.

Thanks! That was extremely helpful.

I’m working on doing this for a school project, but I have very little coding experience. Would it be possible for me to test your code to compute the data for this project? I could tell you more about the project.

-Mychelle