Hi everyone, my name is Morris Greenberg, and this is my first post on this blog. I am currently a master’s student in statistical science at Duke. I have also previously interned and consulted for the Washington Nationals, and have co-taught the baseball analytics class at Tufts twice with Andy Andres.
Today, I want to explore what an optimal proposal looks like for allowing the fewest number of extra innings, by first examining what extra innings looks like in different run scoring environment when the 2 teams are equally talented, and then include different talent levels as well.
Last week, Jim nicely showed with previous empirical data that we should expect the new extra innings rule to speed up games. However, there is a potential sampling issue with using the distributions from previous data: pitchers who allow a man on second at the start of the inning are likelier to be worse than a random pitcher. Using 2019 retrosheet data, we can see that this is the case (I only consider pitchers who faced at least 50 batters here):
|Mean Seasonal OBP||Mean Seasonal SLG||Mean Seasonal OPS|
|Pitchers in man on second, 0 outs situations||.326||.443||.769|
|Pitchers in no men on, 0 outs situations||.320||.433||.753|
We may also expect non-random sampling from the batters’ side if most teams put their best hitters at the top of the batting order. Since analyzing the new rule with previous data has these issues, we may be instead interested in theoretically analyzing strategies that optimize the amount of the time where we would observe differences in the number of runs.
To simplify things, let’s first consider extra innings between two equally talented teams. As a reminder, here is a run expectancy matrix for 2019 (I have similar, though slightly different numbers from Jim in his last article, I suspect because I drop home team extra inning rows since those are situations where the game can end before the inning finishes):
By using a Poisson distribution, we can analyze how often the two teams will score different amounts for different run scoring environments. I wrote 2 functions that allow us to approximate P(Runsteam1 ≠ Runsteam2) for any inning. The first takes advantage of the fact that a Poisson is discrete, so we can directly estimate the complement (P(Runsteam1 = Runsteam2) by taking the probability that 2 independent Poisson variables are both 0, 1, etc. up to a very improbable count – in this case 1000, and then subtract this estimate from 1 to estimate P(Runsteam1 ≠ Runsteam2). The second is a Monte Carlo simulation using two independent Poisson variables, and we keep track of the mean. I show the functions and their results for 0.54 (the old extra inning rule starting run environment) and 1.18 (the new extra inning rule starting run environment):
Thankfully, both methods produce similar results. We see that under the new extra inning rule, we would expect the likelihood of an extra inning to end the game to increase from 55% to 72%.
Since it is faster and more accurate, I use the function that uses the probability mass function from here on out. Below I plot the expected number of runs scored by each team (which is the same as λ, the mean and variance of a Poisson distribution) against the likelihood the game would end in that inning:
We can see that the run-friendlier the environment, the likelier the game will end if the two teams are equally talented. While this is encouraging since the new extra innings rule is a much run-friendlier rule, in practice, we know that two teams are rarely equally matched so we should explore further.
Different Talent Level Model
We now turn to different talent levels. Last year, teams ranged from averaging 0.4 runs per inning (Tigers) to 0.66 runs per inning (Yankees), which means that they are respectively 26.5% different and 22.5% different from the mean runs per inning of 0.54 and are 40% different from each other.
I created a new function that is similar to pct_difference(), but takes in 2 talent levels instead of 1. Below, I plot a similar graph to before, but have multiple lines indicating various percent differences in talent between the two teams. The “-40%” talent difference is our model where one opponent is 40% worse than the run environment (while the other team is equal to the run environment), and the “40%” would be the equivalent for a team that is 40% better than the run environment:
We can see that the shape remains similar across all of these scenarios (the better the run scoring environment, the better), so it looks like even when extending our model, it seems like the new extra inning proposal should help speed up games.
However, it is interesting that some of the lines intersect, meaning that at some point the game is likelier to end faster sometimes when the run environment gets worse because the separation in talent is more pronounced.
We wanted to analyze what would the optimal proposal be for allowing for the fewest number of extra innings, and found that for the most part, the higher the run environment the better (so we expect bases loaded no outs to cause a faster end to the game than no men on with 2 outs). This may cause a clever reader to reason that an even more effective proposal could be to increase the run environment by changing the number of outs in extra innings to 4 or more outs! An obvious flaw with this approach is that each half inning would take longer to complete. There is a subtler flaw though, that we briefly saw in the talent level plot.
Probability of differing outcomes is positively correlated with the overall variance of a half an inning. Additionally, the overall variance is influenced by both sampling variability and talent variability.
In our simplified model, we restricted there to be no talent variability. Since a Poisson distribution variance is the same as the mean, a higher run environment corresponds to higher sampling variance and in turn, a greater probability of differing outcomes. (Note that this is under the assumption that the Poisson distribution is appropriate across all run environments. If there is overdispersion at specific run environment levels, then this analysis would be flawed).
In our second model, we added team talent variability. Now, the overall variance is a sum of the sampling and talent variances. Consequentially, when the percentage difference in the team talent level starts to get large enough, in high run scoring environments, the talent variability will start to outshine the sampling variability. Luckily from the perspective of finding a good extra inning protocol, even the highest run scoring environments in baseball still allow for sampling variability to not be outshined by talent level. To show this, I provide a final plot below for up to 30 runs per half inning. When the run environment is large enough, the more lopsided the matchup, the higher the probability that the outcomes differ:
An implication from this is that the new extra inning rule may slightly help the worse team relative to the old rule, since sampling variability is increasing more than talent variability at 1.18 in the graph. Pairing this with a shortened 60 game season, we may see a lot more randomness in the standings this year!