Ben Lindbergh yesterday asked me an interesting question about baseball playoffs, specifically the playoffs that are best-of-five games. Since the best baseball teams make the playoffs, one would think that most of the playoffs would last 4 or 5 games. But Ben shared the following data about the lengths of five-game playoffs in baseball history:
Number Games Frequency
3 games: 48
4 games: 42
5 games: 42
We see from this historical data that the most common outcome is 3 games Maybe these best-of-five playoffs aren’t as competitive as we think.
Let p denote the probability that the better team wins a single game and let’s assume that the outcomes of individual games are independent. What can we learn about the probability p on the basis of this data?
A Bayesian Analysis
This is a good example where I can illustrate Bayesian thinking. We start with a prior that illustrates my initial beliefs about baseball playoff competition. We next compute a likelihood that is the chance of observing the historical lengths of five-game playoffs as a function of the unknown probability p. Then we compute the posterior which combines the information in the prior with the information in the data.
Personally I believe that five-game baseball playoffs in Major League Baseball are generally between teams of similar ability. So I think that p, the probability the stronger team wins a single game, is close to 0.5. Also I think that p is unlikely to be larger than 0.6. Based on these assumptions, I assume that p has a half normal density where the parameters of the normal are 0.5 and 0.05. (By the way, p can’t be smaller than 0.5 since this represents the ability of the better team.)
Next, I observe some data. Ben told me the numbers of best-of-five playoffs that lasted 3 games, 4 games, and 5 games in MLB history were respectively 48, 42, and 42. The likelihood is the chance of observing these results if the probability the stronger team wins a single game is p. This calculation is a little tedious (I’ll spare you the details), but the likelihood is a fancy function of the single win probability p. Here is a graph of the likelihood with this data.
Interestingly, the data tells me that p is pretty high, in the 0.65 – 0.70 range. Since p is the probability the stronger team wins a single game, this seems to say that most five-game playoffs aren’t as competitive as we would like to think
The prior reflects my beliefs about baseball competition before I observed any data. The posterior, my beliefs after observing data, combines the information in the prior and the likelihood. It is easy to compute the posterior — one just multiplies the prior and likelihood curves and one gets the following curve.
What Have We Learned?
Looking at the posterior curve, I see that p is most likely in the interval (0.5, 0.6). It can be calculated that the probability that p is in the interval (0.5, 0.628) is 90%. Note that the posterior is a compromise between my initial beliefs as reflected in the prior and the information contained in the data.
What About Momentum?
We love to talk about momentum in sports. Teams get hot and cold and if a team is playing well and wins a game, then many fans think that the team is likely to win the next game. My model assumed (for convenience) that the outcomes of the games were independent — that is, the game outcome does not depend on the result of the previous game — that is, there is no true momentum for a team. But perhaps there is momentum and the high number of five-game playoffs that last three games is a result of this momentum. It would be an interesting exercise to compare my model with an alternative model that assumes that there is real momentum where the game outcomes are not independent.
All of the R code for this work can be found at my gist github site. My colleague Maria will be happy that I didn’t use any tidyverse code in this exercise.