Monthly Archives: June, 2020

Simulating a 60-Game Season

Introduction

This is one of my popular things to talk about — the nature of baseball competition. We’d like to think that the winner of the World Series is the “best team in baseball. But if you carefully think what this means, you’ll learn that the winner of the World Series is likely not the team with the most talent. Bill James talked about this topic at a meeting at my school in 1992 (here is a post with more detail) and I’ve written about this in different places.

Anyway, this topic is very relevant today since MLB is planning on a 60-game season for the 2020 season which is 63% shorter than the regular 162-game season. So that motivates this post where I speculate using a simulation what a 60-game regular season will look like. I apply a reasonable model called a Bradley-Terry random effects model for representing baseball competition. By running this simulation for 1000 seasons, we can learn about the association of a team’s ability with its performance and learn also how many games it will take for a team to make the playoffs.

A Model for Baseball Competition

The Bradley-Terry model provides a popular and helpful way of modeling team competition. Suppose the thirty MLB teams have unknown ability values, call them A1, …, A30, that are distributed according to a normal curve with mean 0 and standard deviation S. If team C plays team D, then the probability that C defeats D is given by the probability

p_{CD} = \frac{\exp(A_C - A_D)}{1 + \exp(A_C - A_D)}

The only parameter unknown in this model is S which reflects the level of competition of the MLB teams. If the teams all have the same ability, then S would be equal to 0. We can actually fit this model and estimate the parameter S using game results data from a previous season. For the work here, I am going to assume that S = 0.3 which seems to reflect the level of MLB competition in recent seasons.

Simulating a Season

Using this Bradley Terry model, it is straightforward to use R to simulate a 2020 baseball season as follows.

  1. First I simulate 30 random numbers from a normal distribution with mean 0 and standard deviation 0.3 — these numbers will represent the abilities of the 30 teams. Note that team names are matched at random to these abilities, but I suppose we could attach names in a less-random fashion based on current Vegas odds.
  2. Then I simulate all of the game results in the MLB’s proposed 60-game schedule. Each team plays 10 games against each team in its own division (total of 40 games), and then 4 games against each team in the other league same division (total of 20 games).

Here’s the final standings for one of the simulated 60-game seasons (again you can ignore the team names since the teams are randomly assigned to the team strengths):

These are the type of W/L records that one might see in a 2020 season. Several teams like SEA and BAL in this particular simulation will win close to 40 games. A number of teams will hover between 28-32 wins (11 teams in this particular simulation) which will lead to some drama towards the end of the season fighting for the wild card berths. Look at the AL Central — all teams have between 29 and 33 wins — I suspect we’ll see a division like this in 2020.

Simulating Many Seasons

I wrote a R function one.simulation.20() that will simulate a single season of 60 games. The output of this function are the numbers of wins for each team, the division winners and the wild card teams for each league. In addition, we record the team talents (values of A_1, …, A_30) that were used in this simulation.

We repeat this process for 1000 seasons, collecting all of the variables for all seasons. This repeated-season data is helpful for understanding the relationship between a team’s ability and its performance during a season. This simulation results will also help us understand how likely it is for teams of different abilities to get into the playoffs. Last, we’ll use the simulation to see many games a team needs to win to have a good chance of making the playoffs.

How Good Are the Playoff Teams?

Before the season begins, we know that a team’s ability is normally distributed with mean 0 and standard deviation 0.3. What have you learned about a team’s ability if they win their division? What have you learned about the team’s ability if they are one of the wild card teams? These questions are easy to answer from our simulation. We collect the talent values for all of the teams that win a division, and similarly collect the talent values for the wild card teams. Here I display density estimates of the talents for three groups — all teams, the wild-card teams and the division-winning teams.

Actually, this graph shows that you don’t really learn much about a team’s ability based on their performance in this 60-game season. Sure, a division-winner tends to have more talent than a wild-card team, but the difference between the “win division” density curve and the “wild card” density curve seems pretty small. Note also that it appears that below-average teams, that is teams with a negative talent value, are included in the teams making the playoffs.

What is the Chance that a Particular Team Ability Gets in the Playoffs?

Here we address a related question. Suppose you know a team’s talent value — what is the chance that the team with this talent will make the playoffs, either as a division-winner or a wild-card team? Here the outcome of a team’s performance is ordinal — either it doesn’t make the playoffs, it is a wild-card team, or it is a division-winner. Using an ordinal regression model, I estimate the probabilities of the three outcomes as a function of its talent using our simulated data. Below I show the estimated probability of (1) winning division, (2) wild-card, and (3) playoff (either winning division or wild card) as a function of the team talent.

Here are some interesting takeaways from this graph:

  • An average team, that is a team with a talent value of 0, has a 25% chance of making the playoffs.
  • The chance of being a wild card team is maximized for a team with talent value of 0.25.
  • A “top 10%” team has a talent value of 0.38. This team has approximately a 75% chance of making the playoffs.

How Many Games Should a Team Win?

Here is a question that many managers are currently thinking about. How many games do they have to win in a 60-game season to get into the playoffs? Again, this is easy to answer from our 1000 simulated seasons. We find the proportion of teams that (1) win their division, (2) are wild cards, and (3) make playoffs for each of the win totals of 25 through 45. Here’s the graph — some things we learn:

  • A team finishing with a 30-30 record only has about a 12% chance of making the playoffs.
  • But a 35-25 team will likely (with probability 86%) make the playoffs either as a division winner (43%) or a wild card (43%). In fact, the chance of being a wild card team is maximized by winning 35 games.
  • Winning a few more games beyond 35 really has a dramatic effect on the probability of making the playoffs — with 37 wins, the probability of making the playoffs is almost 100%

Wrap-Up

  • Again, if you want to read more about the Bradley-Terry model, look at my older post.
  • I could have improved the Bradley-Terry model by adding a term to account for home-advantage, but I don’t think that would have had much effect on the findings from the simulation.
  • I have written a Markdown file available on my Github Gist site that does all of this R work. This Markdown file includes the function one.simulation.20() which implements one 2020 season simulation and the function print_standings()which displays the division standings of the simulated results as shown above. The simulation function is similar to the function described in detail in Chapter 9 of Analyzing Baseball with R — in that chapter, I was simulating the 1968 season.
  • I have only talked about simulating the 2020 regular season. Another round of uncertainty is added with the baseball playoffs. All of these playoff series are relatively short (especially the game between the two wild card teams in each league) and so the outcomes are pretty unpredictable. I thought it would be best here to focus on the regular season since we’ve never seen a 60-game season in MLB baseball history.