Introduction
Last year, I wrote a post describing an interesting presentation by Adie Wyner that explored the well-known Times Through the Order (TTTO) belief in baseball that pitchers tend to perform worse when facing batters for the second or third time during a game. In that post, I used Retrosheet play-by-play files to explore the TTTO effects for individual pitchers. I found some interesting TTTO effects, but it was unclear there were underlying pitcher talents. My takeaway from Wyner’s presentation is that pitchers tend to do worse in performance as they face more batters, but there is no abrupt change in performance at the second or third time through the lineup.
The associated research paper “A Bayesian analysis of the time through the order penalty in baseball” by Brill, Deshpande and Wyner has been recently posted on the arXiv site. Although this is an interesting paper, it really seems written for a statistical audience. So I thought it would be beneficial to describe the multinomial regression model that these authors use and apply it to explore pitcher fatigue for some of the great pitchers in history. Maybe my explanation will help readers look at this paper and appreciate the results they found.
The Data
Brill, Deshapande and Wyner provide the dataset for their work on their Github site. This dataset is a collection of Retrosheet play-by-play files for the 1990 through 2020 seasons with some additional variables. For our work, one key variable is BATTER_SEQ_NUM
that gives the number of the batter that a pitch has faced. Also for this study I need a variable of the quality of the hitter. Due to small sample sizes, the observed wOBA value for a given study may be unstable. Using a multilevel model, I adjust the raw wOBA measures to obtain better estimates of hitter quality.
The Model
Let denote one of the 7 possible outcomes of a plate appearance (out, single, double, triple, home run, unintentional walk, hbp). The variable
is a multinomial outcome which is described by probabilities of these seven outcomes. The multinomial regression model says that the logarithm of the ratio of the probability of outcome
to the probability of the first outcome “out” is a linear function of several covariates on the right hand side of the equation.

What are possible covariates?
- the quality of the pitcher
- the quality of the hitter
- the handedness of the pitcher and hitter (same side or different side)
- is the game in the pitcher’s home park?
- the number of the batter faced — this is
in the equation
- indicators that the pitcher has faced the hitter twice through the lineup (2TTO) or three times through the lineup (3TTO)
In this exercise, we will fit this model for individual pitchers and focus on two important covariates — the number of the batter faced and the quality of the hitter. In this model, we will be estimating the intercept , the slope
and the slope
for each of the multinomial outcomes. Since there are six outcomes (besides “out”), we are estimating 6 x 3 = 18 model parameters.
Measure of Performance
When we fit this model for a specific pitcher, the parameter estimates can be reexpressed in terms of the multinomial probabilities for the seven outcomes of the plate appearance. But we are interested in learning about the overall performance of the pitcher. A good convenient summary measure is the expected wOBA measure that is a linear combination of the PA probabilities. Using the FanGraphs weights in the description of wOBA, we have
Expected wOBA = 0.69 P(W) + 0.72 P(HBP) + 0.89 P(1B) + 1.27 P(2B) + 1.62 P(3B) + 2.10 P(HR)
and so we can use the model fit to estimate a pitcher’s expected wOBA for specific values of the batter quality and batter number.
Fitting the Model to Individual Pitchers
For the period 1990 through 2020, I identified 126 pitchers who had faced at least 8000 batters during this period. For each pitcher, using the function multinom()
in the nnet
package, I fit the multinomial regression model using two covariates — the batter number and the quality of the hitter. I obtain parameter estimates and the associated variance-covariance matrix of the estimates. I can obtain a simulated sample of draws from the posterior distribution of the parameters.
To show the fatigue effect for a specific pitcher, I fix a value of the batter quality (I use wOBA = 0.3), and obtain simulated draws of the expected wOBA for each of the batter numbers from 1 to 30. One can summarize the fatigue effect by fitting a line to the scatterplot of the simulated draws of expected wOBA and the batter number — if the slope is positive, there is a fatigue effect. If the slope is negative, then the pitcher is actually getting better as the batter number increases.
Here is a histogram of the slopes (the summary fatigue effects) for the 126 pitchers. 74% of the slope estimates are positive, indicating that pitchers generally fatigue as they face more hitters during a game.

Expected wOBA Graphs for Individual Pitchers
Among the 126 pitchers, I chose six pitchers that I’m interested in — Curt Schilling, Mike Mussina, Pedro Martinez, Randy Johnson, Roger Clemens and Roy Halladay. For a given pitcher, the scatterplot below displays simulated draws of the expected wOBA for each of the batter sequence numbers from 1 to 30 and a linear fit is overlaid to show the basic pattern. Five of the pitchers (Schilling, Mussina, Johnson, Clemens and Halladay) exhibit fatigue as their expected wOBA tends to increase for larger batter sequence values. There is an interesting outlier — Pedro Martinez appears to get better for larger batter sequence values. It might be a good study to explore in more detail Martinez’s pitching performance over innings.

What’s In the Paper?
Here I am illustrating the basic multinomial regression model to give the reader a general sense of how it can be used to study pitcher fatigue. Brill, Deshapande and Wyner in their paper actually fit a multilevel version of the model including pitcher and batter effects and terms to allow for 2TTO and 3TTO effects. Here are some conclusions from their work.
- As we saw in our examples, Brill, Deshapande and Wyner show that the expected wOBA increases steadily over a game and there no special discontinuities for the 2nd and 3rd time through the order.
- The authors find little evidence of a strong batter learning effect.
- They recommend that managers base decisions on taking out a pitcher on the pitcher’s quality and continuous decline throughout the game.
Here’s an illustration of this decision-making process in the current 2022 MLB playoffs. The Phillies pitcher Zack Wheeler was pitching great in 7 innings of the first game of the Phillies-Padres series allowing only two baserunners. Although Wheeler’s pitching performance was stellar, the team noticed a drop in the speed of his four-seam fastball in the later innings. As a result the manager decided to replace Wheeler with a reliever in the beginning of the 8th inning. In this situation, the Phillies appeared to be making a decision based on their general knowledge of Wheeler’s pitching and his performance during this particular game.
R Code?
Since one of you asked, here is a set of functions that I wrote for this task on my Github Gist site.
Typo in def of woba
John, thanks for catching that — I think I corrected that typo.
YW, great post!
Interesting read. Curious why you chose batter_seq_num rather than something like pitch count to better capture fatigue? Your model already includes indicator variables for number of times the pitcher has seen the same batter.
Good point. I wrote the post to provide an introduction to that particular paper and they were using batter sequence number as a covariate. Perhaps cumulative pitch count would be a good alternative covariate.
Do you have a gist on Github for this exercise?
James, since you asked, here are the functions I wrote: https://gist.github.com/bayesball/de552bc32575bfbf50362081ae4fda99
Since there is a lot here, I decided not to mention the code on the post.
Jim