In the early weeks of the 2018 baseball season, we’re observing some interesting team records. Looking at the standings after the games of April 14, I graph the number of wins of the 30 teams below.
- Several teams with remarkably good records — the Red Sox are 12-2, the Angels are 13-3, the Mets are 11-2 and the Diamondbacks are 11-3.
- In contrast, other teams are really struggling — the Reds are 2-12 and the Royals and Rays have won only 3 games.
- The Phillies are 8-5 which seems surprising for a rebuilding team
To me, these extreme records seem a bit surprising — there seems to be more variation in win/loss records than one would expect after two weeks of the baseball season.
There are two reasons for variation in win/loss records. First, teams have different abilities and we are pretty sure about the “good” teams who are likely to make it to the World Series and the “bad” teams that will have poor records. Second, luck or chance variation plays a role in the game results and this type of variation can play havoc with win/loss records especially in the early part of the season when relatively few games are played.
Here’s my approach for determining if the variation in W/L records in the 2018 season is extreme.
- I’ll use a random effects Bradley-Terry model to estimate the abilities of the 30 MLB team.
- Using this model, I simulate game results using the actual MLB schedule of games played through April 14. These simulations incorporate both types of variation — the variation between team abilities and the coin-flipping “luck” variation.
- I use the standard deviation of the simulated team wins to measure the variation in team success. I construct the predictive distribution of this standard deviation from many simulations from my model.
- I compute the standard deviation of team wins from the 2018 season (through the games of April 14). I want to see if this observed standard deviation is extreme relative to the predictive distribution.
The Bradley-Terry Model
This is one of my favorite models for paired competitions like baseball. One measures a team’s strength by a parameter S — we assume that the 30 team strengths S1, …, S30 come from a normal distribution with mean 0 and standard deviation . If one team A plays a second team B, then the probability that team A wins is given by the logistic model
P(team A wins) =
Using data for a whole MLB season, one can estimate , the variation in team strengths. In the below simulation, I use the value which seems like a representative estimated value from recent seasons. (By the way, I discussed fitting this type of model using a R package in an earlier post.)
Simulating Game Results
Next I simulate game outcomes using the actual 2018 MLB schedule through the games of April 14. I start with simulating 30 team talents from the talent distribution and then simulate game results — these are essentially weighted coin flips using win probabilities using the Bradley-Terry model. I find the number of games won by the 30 teams — I summarize the variation in team wins by a standard deviation.
The method described above is for a single simulated season. I repeat this for 1000 seasons, collecting the standard deviations. Below I construct a histogram of the standard deviations from this predictive distribution from the Bradley-Terry model. The red line corresponds to the standard deviation of the team wins for the 2018 season through the April 14 games.
Note that this red line is in the right tail of the distribution –this means that it is unlikely to observe this standard deviation value from the predictive distribution based on our Bradley-Terry model.
What Have We Learned?
- In one sense, the win/loss records we see in the current season are extreme. There is more variation in the team wins (think the Reds and the Red Sox) that one would predict on the basis of the random effects Bradley-Terry model.
- Why do we care? I think Major League Baseball wants to see a competitive baseball season where many teams have an opportunity to get in the playoffs and reach or even win the World Series. If some teams are tanking or focusing on rebuilding, that may hurt the competitive nature of the sport. I’ve provide a little evidence to indicate that baseball is not as competitive as it has been in the past.
- By the way, this whole study got started when I started looking at Bill Petti’s changes to his
baseballrpackage. Using this R package there are more ways of scraping data from different sources. In particular, I used the
standings_on_date_breffunction to obtain the standings on April 15. Also, I used the
team_results_breffunction to obtain the actual MLB schedule for the games through April 14.
The R script including a function to implement the predictive simulation can be found on my GitHubGist site.