There has been a lot of talk (at least on last night’s ESPN broadcast of the Mets/Dodgers game) about Clayton Kershaw’s remarkable ratio of strikeouts to walks. Following up my recent post on modeling pitch-count transitions by a Markov Chain, I thought I’d look more closely at Kershaw’s pitch-count transitions, and specifically the frequency of different pitch counts. This will explain, in part, why he has so many strikeouts and so few walks.

A useful representation of movement of pitch counts is a Markov Chain. There are 13 possible states (the 12 possible pitch counts and the “end of PA” state) and one moves between the different states according to specify probabilities. We represent the probabilities by a transition matrix. Here is Kershaw’s transition matrix using 2015 season data.

0-0 0-1 1-0 0-2 1-1 2-0 1-2 2-1 3-0 2-2 3-1 3-2 X 0-0 0 0.54 0.32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0-1 0 0.00 0.00 0.48 0.39 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 1-0 0 0.00 0.00 0.00 0.57 0.28 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0-2 0 0.00 0.00 0.17 0.00 0.00 0.45 0.00 0.00 0.00 0.00 0.00 0.38 1-1 0 0.00 0.00 0.00 0.00 0.00 0.50 0.31 0.00 0.00 0.00 0.00 0.19 2-0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.28 0.00 0.00 0.00 0.22 1-2 0 0.00 0.00 0.00 0.00 0.00 0.20 0.00 0.00 0.34 0.00 0.00 0.46 2-1 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.55 0.26 0.00 0.19 3-0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.68 0.00 0.32 2-2 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.21 0.55 3-1 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.42 0.58 3-2 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.38 0.62 X 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00

This tells us, for example, that Kershaw moves from a 0-0 count to a 0-1 count with probability 0.54, he moves from a 0-1 count to a 0-2 count with probability 0.48, and so on.

Once we have represented the pitch count movement this way, there are a number of convenient calculations that one can make. Specifically, suppose one is interested in the average length of a Kershaw plate appearance, and specifically how long (on average) one stays in specific counts.

Let denote the transition probabilities for the non-absorbing states (that is, all pitch counts excluding the end of PA state). Then one can compute the average lengths of stay by the calculation

where is the identity matrix and (-1) is the matrix inverse. If we look at the first row of the matrix , we’ll see the average number of times that a Kershaw batter is in each pitch count. (By the way, I’ve confirmed that the estimate of the average length of a PA using this Markov Chain model is close to the actual length of PA.)

Count Expected 1 0-1 0.54382022 2 1-0 0.32134831 3 0-2 0.31348315 4 1-1 0.39550562 5 2-0 0.08988764 6 1-2 0.42359551 7 2-1 0.16629213 8 3-0 0.02471910 9 2-2 0.31011236 10 3-1 0.05955056 11 3-2 0.14494382

On average, a Kershaw batter will see a “0-1” count .54 times — .54 is the probability of moving from 0-0 to 0-1 counts. It is interesting that the average number of 0-2 visits is .34 and the average number of visits to 3-0 is only .025.

To understand if these lengths of stay numbers are distinctive, I repeated the above calculations for all 2015 starters with at last 30 starts. The graph below plots the average number of stays in each possible count and shows Kershaw’s values with red dots.

Looking at this graph, it is pretty clear that Kershaw tends to stay in pitcher counts. Among these 2015 starters,

- He’s above-average in getting a strike on the first pitch.
- He’s remarkably good in achieving 0-2 and 1-2 counts.
- He’s below-average in visiting the so-called batter counts of 2-0, 3-0, 2-1, etc.

A pitcher with these pitch count tendencies will be very successful which is reflected in his current 105 strikeouts and 5 walks.

**Late addition:** Daniel was interested in seeing the R code to perform these calculations. You first need to obtain the Retrosheet play-by-play data for the 2015 season. Then you can use the R script on my gist site.

Any chance you could share the code and dataset you used to get these numbers? I’d like to see them for all pitchers.