# Bayesian Pythagorean Modeling – Part II

### Introduction

In Part I of this Bayesian modeling post, I introduced a Bayesian version of Bill James’ Pythagorean relationship. This model had two unknowns, the Pythagorean exponent k and the error standard deviation $\sigma$ and I described constructing a prior for (k, $\sigma$) that reflected one’s knowledge about the value of the power in James’ formula and how win/loss ratios can deviate from this formula. In this post, assuming we have a sampling model and a prior, I am going to discuss the mechanics of Bayesian learning about these parameters and predicting future response values.

Since one typically is interested in predicting wins instead of win/loss ratios, I am going to work with a slightly different version of the model where the number of team wins (in a 162-game season) follows a normal distribution where the mean is given by James’ formula, and $\sigma$ reflects the variation of the win totals about the mean.

For a prior, I will assume k is normal with mean 2 and standard deviation 0.5 and $\sigma$ is exponential with rate 1.

### Posterior

We now observe data, that is, the W and R/RA values for the 30 teams in the 2019 season, and by Bayes’ rule, we find the new or posterior probability distribution for (k, $\sigma$). The posterior density of (k, $\sigma$) is equal to (up to a proportionality constant) the product of the likelihood (the probability of the observed data viewed as a function of the parameters) and the prior. All types of inference are found by summarizing this bivariate probability distribution.

### Brute Force Calculation

A straightforward way to summarize the posterior is to find a grid of (k, $\sigma$) values that cover most of the probability. I have a R package LearnBayes that makes it easy to implement this brute-force method. One writes a function logpost2() that computes the logarithm of the posterior density of (k, $\sigma$). The arguments to this function are the vector of parameters theta and a data frame d that contains the data. Note that I’ve included both the likelihood and prior terms in this function.

logpost2 <- function(theta, d){
k <- theta[1]
sigma <- theta[2]
n_mean <- 162 * d$RR ^ k / (1 + d$RR ^ k)
sum(dnorm(d\$W, n_mean, sigma, log = TRUE)) +
dnorm(k, 2, 0.5, log = TRUE) +
dexp(sigma, 1, log = TRUE)
}


Next, by use of the gcontour() function, one finds a rectangle that covers the region where most of the probability falls. This function produces a contour plot where the contours are drawn at 10%, 1% and 0.1% of the largest posterior value and, by trial and error, I find a rectangle (specifically 1.6 < k < 2.3 and 2 < $\sigma$ < 6) that covers these three contour lines.

gcontour(logpost2, c(1.6, 2.3, 2, 6), d)


This contour graph is produced by computing the posterior density on a 50 by 50 grid. One way to summarize the posterior is to simulate a large number of values from the posterior computed on this grid and then summarize the simulated draws. I use the simcontour() function to do the simulation from this grid:

pts <- simcontour(logpost2, c(1.6, 2.3, 2, 6),
d, 5000)


Here I display 5000 simulated draws from the posterior on the contour plot.

If I am interested primarily in learning about the exponent value k, I collect those simulated values of k to perform inference. For example, here is a density estimate which represents the marginal posterior density of k. A 90% interval estimate for k (found from the simulated draws) is (1.81, 2.09).

### Simulation using Stan Software

Currently a popular and powerful way to perform Bayesian computations is to simulate from a Markov Chain that convergences in theory to the posterior distribution of interest. One particular type of simulation methodology is Hamiltonian MCMC sampling implemented by the Stan software. There are interfaces of Stan to other languages such as R or Python and the brms package provides a very attractive interface for fitting a large class of Bayesian models. I’ll illustrate using this package below.

First I specify the prior on (k, $\sigma$) by use of two applications of the prior() function.

prior1 <- c(prior(normal(2, 0.5), nlpar = "b1"),
prior(exponential(1), class = "sigma"))


Then I implement the Stan simulation by use of the brm() function in the brms package. Note that this function actually specifies this nonlinear model and the “family = gaussian” argument indicates that we are assuming normally distributed errors.

fit1 <- brm(bf(W ~ 0 +
162 * RR ^ b1 / (1 + RR ^ b1),
b1 ~ 1, nl = TRUE),
data = d, prior = prior1,
family = gaussian)


Below I show some output of this function. The right set of graphs show streams of simulated draws of the parameters k and $\sigma$ and the left set of graphs display density estimates of the simulated posterior draws of these parameters. One gets essentially the same estimates of the power parameter k as I got using the brute-force method. For example, the 90% interval estimate using Stan is (1.79, 2.12) which is very close to the interval (1.81, 2.09) using brute force. One advantage of the Stan software is that it can efficiently sample from very complicated Bayesian models with many parameters.

### Prediction of Wins

Actually, the objective here is not about learning about the parameters k and $\sigma$, but rather to predict the number of wins of a team who has a particular runs ratio.

We predict the number of wins by use of the (posterior) predictive distribution which can be simulated in a similar way that we simulated future data from the (prior) predictive distribution in Part I of my post. Specifically, we simulate a future number of wins by first simulating values of (k, $\sigma$) from the posterior distribution and then simulating a wins total from the normal sampling distribution using these simulated parameter values. Using the brms package this is done using the posterior_predict() function. The arguments are fit1 (the output of the posterior fitting) and a data frame containing the values of the runs ratio that we are interested in.

PP <- posterior_predict(fit1,
newdata = data.frame(
RR = seq(0.8, 1.2, by = 0.1)))


I use violin plots below to summarize the simulated number of wins from the predictive distributions for these five values of the runs ratio. For example, a team that scores the same number of runs as it allows (RR = 1) will likely win between 70 and 90 games in a 162-game season. The amount of variation in these win totals might be surprising — this tells us, that there is more to winning games than just having a good runs ratio.

### Summing Up

• All of the R code for this exercise is available on my Github gist site.
• In this study, we placed priors on the parameters and one general concern is the impact of these priors on the posterior analysis. This is easy to address by trying other priors and seeing if there is a change in the inferential summaries or predictions. If the predictions seem to depend on the choice of prior, then I’d think harder about the prior.
• When possible, it is good to check one’s work by trying different computational methods. We did this by showing both a brute-force method and a modern simulation approach — both methods gave very similar estimates at the parameter k and $\sigma$.
• The purpose of this two-part post was to illustrate Bayesian modeling for a small regression problem. But we can easily generalize this problem to a situation where there is a obvious grouping of the data, and one is interested in doing many regressions, one for each group. For example, suppose we are exploring this Pythagorean relationship for many seasons and we want to apply this regression model for each season. Or maybe we have data for different professional leagues, say minor league, MLB, Japanese professional baseball, etc. and we want to apply this model to each league. This motivates the consideration of Bayesian multilevel modeling (and fitting using Stan) which will be the subject for a future set of posts.