Monthly Archives: July, 2023

What is the Chance a Division Winner Will Decline the Following Season?

By Jim Albert on July 10, 2023 | Leave a comment

Introduction

In a couple of months, I will giving.a talk at the New England Symposium on Statistics in Sports reviewing Carl Morris’ contributions to statistical thinking in sports. In one of his review papers, Morris provides a nice baseball illustration of a Bayesian model explaining the regression to the mean phenomenon, the idea that extreme performances by teams or individuals tend to move to the average in the following season. Here I describe Morris’ example, use a Shiny app to illustrate the posterior and predictive calculations, and show that the results from this Bayesian model seem to agree with what happens in modern baseball competition.

The Model

Suppose a team wins their division in a particular season. What is the chance that they will decline in the following season? Let $y_1$ and $y_2$ denote respectively the winning fractions of the team in the two seasons. We assume that $y_1$ and $y_2$ are independent where $y_j$ is normal with mean $p$ and standard deviation $\sqrt{V}$ . The proportion $p$ is the ability of the team — if the team was able to play an infinite number of games, then $p$ would represent its long-term winning fraction. Note that we’re assuming the team’s ability does not change from the first to second season. To complete this model, we put a prior on the team’s ability. We assume that $p$ is normal with mean $\mu$ and standard deviation $\tau$ .

Constructing the Prior

Morris’ best guess is that this particular division-winning team is average which means that it would win half of its games in the long-run. Also he believes that it is unlikely that the team would win more than 60% of its games in the long run. So he assumes $\mu = 0.50$ and $\tau = 0.05$ , so $p \sim N(0.50, .05)$ . The reader can check that the probability that $p > 0.60$ is 0.023, which confirms Morris’ belief that a true winning fraction for the team larger than 0.60 is rare.

Bayesian Calculations

In this scenario, since teams are playing a 162-game season, a good approximation to the sampling standard deviation for an average team is $\sqrt{V} = \sqrt{0.5 (0.5) / 162}$ . Since $V$ is known, the Bayesian calculations for this normal sampling/normal prior model are well-known. We observe the team’s winning fraction for the first season $y_1$ . The posterior density for the true winning fraction $p$ given $y_1$ is normal( $\mu_1, \tau_1$ ), where the mean and standard deviation are $\mu_1 = (y_1 / V + \mu / \tau^2) / (1 / V + 1 / \tau^2)$ and $\tau_1 = \sqrt{1 / (1 / V + 1 / \tau^2)}$ .

Morris shows that the posterior mean of the true winning fraction is a compromise between the observed winning fraction $y_1$ and the prior mean $\mu$ . One can rewrite the posterior mean as

$\mu_1 = (1 - B) y_1 + B \mu$

where $B$ is the shrinkage factor

$B = V / (V + \tau^2)$ .

Actually we are most interested in predicting the team’s fraction in the following season $y_2$ . So we compute the predictive density of $y_2$ given $y_1$ .

Morris shows that this predictive density is normal with mean $(1 - B) y_1 + B \mu$ and standard deviation $\sqrt{V + V (1 - B)}$ .

What is the chance that the team declines (in winning fraction) the following season? To answer this question, we look at the predictive density of the change in winning fractions $y_2 - y_1$ given $y_1$ , focusing on the probability $P(y_2 - y_1 < 0)$ . How much do we expect the team to decline? To answer this, we look at the mean of the predictive density of $y_2 - y_1$ .

Shiny App

I wrote a short Shiny app PredictWinningFraction() to illustrate these calculations. A snapshot of the app is displayed below. Since I think it is easier to think about quantiles, one input in this app are the prior quartiles of $\mu$ . In this example, note that the prior quartiles are set to 0.466 and 0.534 which corresponds to a mean of 0.50 and standard deviation of 0.05 for Morris’ normal prior for this “average” team. A second input is the observed winning fraction for the first season $y_1$ which is 0.59 in this example. There are two graphs displayed in the app — the top graph shows the prior and posterior of the true winning fraction $p$ and the bottom graph shows the predictive density of the change in winning fractions $y_2 - y_1$ . The table below the graph displays the summary calculations. We see the posterior of $p$ given $y_1 = 0.49$ is N(0.556, 0.031). In this example, the shrinkage factor is

$B = \frac{0.25 / 162}{0.25 / 162 + 0.05^2} = 0.382$

which means the posterior mean of $\mu$ is a weighted average of the prior mean $\mu$ and the observed fraction $y_1$ with respective weights 0.38 and 0.62.

From the table, the predictive density of $y_2 - y_1$ is N(-0.034, 0.050). The probability of a drop in winning fraction for this particular team in the following season is 0.752 and we predict the drop to be -0.034. One nice feature of this app is that you can modify the inputs — for example, one might adjust the prior quartiles if you believe this team is above-average in ability, and you can change the observed winning fraction of the team.

How Do the Division Winners Perform the Following Season?

In MLB, using the Lahman package, I collected the winning fraction and the winning fraction in the following season for all division winners from 1969 through 2021. For the 265 division winners, the proportion that had a lower winning fraction the following season was 0.751. The average drop in winning fraction was 0.0435.

I next applied Morris’ model. Using each division winner’s winning fraction, I applied the Bayesian model with Morris’ average team prior and computed the probability of dropping in winning fraction in the following season and the mean drop size. I averaged these computations over the 265 division winners — I found the average probability of dropping in the next season to be 0.745 and the average drop size to be 0.0343. Note that the average drop probability 0.745 from this Bayesian model agrees closely with the actual observed drop fraction of 0.751, but the average drop size of 0.0343 from the model underestimates the observed average drop of 0.0435.

Comments

(Morris’ article). In the article, Carl Morris is reviewing the baseball examples (such as the famous Efron and Morris example of simultaneously estimating 18 batting averages) that illustrate multilevel modeling. He used this division winner example to demonstrate the notion of Bayesian shrinkage. It is a relatively simple example of the two-stage Bayesian model that uses familiar data to a baseball fan.
(Assumptions in the Bayesian model). There are two key assumptions in the Bayesian model. One is that the strength of the team doesn’t change between the two seasons, and a second is the prior mean that the division-winning team is average. One could use the Shiny app with a prior that reflects the belief that the team is above-average where the prior mean exceeds 0.50. But it is interesting that this use of this average team prior leads to the predictive calculation that 74.5% of the teams will experience a drop in the second season which agrees with the observed percentage drop among the 265 division winners in baseball history.
(Shiny code?). The code for this Shiny app PredictWinningFraction() is available as part of my ShinyBaseball package. The app is self-contained, so the interested user can try it out by downloading the single app.R file and placing it in a new folder, launching the app.R file in RStudio, and running the app by use of the runApp() function.

Posted in: Uncategorized

	Jim Albert on retrosheet Package and Compari…
	addisonmcg99 on retrosheet Package and Compari…
	Jim Albert on Calculation of Win Probabiliti…
	John Purlia on Calculation of Win Probabiliti…
	bbaumer21 on New Edition of Analyzing Baseb…

Exploring Baseball Data with R

Monthly Archives: July, 2023

What is the Chance a Division Winner Will Decline the Following Season?

Introduction

The Model

Constructing the Prior

Bayesian Calculations

Shiny App

How Do the Division Winners Perform the Following Season?

Comments

Recent Posts

Recent Comments

Archives

Categories

Meta