# How Competitive are the 5-Game Playoffs?

#### Introduction

Ben Lindbergh yesterday asked me an interesting question about baseball playoffs, specifically the playoffs that are best-of-five games. Since the best baseball teams make the playoffs, one would think that most of the playoffs would last 4 or 5 games. But Ben shared the following data about the lengths of five-game playoffs in baseball history:

Number Games Frequency
3 games: 48
4 games: 42
5 games: 42

We see from this historical data that the most common outcome is 3 games Maybe these best-of-five playoffs aren’t as competitive as we think.

Let p denote the probability that the better team wins a single game and let’s assume that the outcomes of individual games are independent. What can we learn about the probability p on the basis of this data?

#### A Bayesian Analysis

This is a good example where I can illustrate Bayesian thinking. We start with a prior that illustrates my initial beliefs about baseball playoff competition. We next compute a likelihood that is the chance of observing the historical lengths of five-game playoffs as a function of the unknown probability p. Then we compute the posterior which combines the information in the prior with the information in the data.

#### My Prior

Personally I believe that five-game baseball playoffs in Major League Baseball are generally between teams of similar ability. So I think that p, the probability the stronger team wins a single game, is close to 0.5. Also I think that p is unlikely to be larger than 0.6. Based on these assumptions, I assume that p has a half normal density where the parameters of the normal are 0.5 and 0.05. (By the way, p can’t be smaller than 0.5 since this represents the ability of the better team.)

#### The Likelihood

Next, I observe some data. Ben told me the numbers of best-of-five playoffs that lasted 3 games, 4 games, and 5 games in MLB history were respectively 48, 42, and 42. The likelihood is the chance of observing these results if the probability the stronger team wins a single game is p. This calculation is a little tedious (I’ll spare you the details), but the likelihood is a fancy function of the single win probability p. Here is a graph of the likelihood with this data.

Interestingly, the data tells me that p is pretty high, in the 0.65 – 0.70 range. Since p is the probability the stronger team wins a single game, this seems to say that most five-game playoffs aren’t as competitive as we would like to think

#### The Posterior

The prior reflects my beliefs about baseball competition before I observed any data. The posterior, my beliefs after observing data, combines the information in the prior and the likelihood. It is easy to compute the posterior — one just multiplies the prior and likelihood curves and one gets the following curve.

#### What Have We Learned?

Looking at the posterior curve, I see that p is most likely in the interval (0.5, 0.6). It can be calculated that the probability that p is in the interval (0.5, 0.628) is 90%. Note that the posterior is a compromise between my initial beliefs as reflected in the prior and the information contained in the data.