As someone who does his academic research largely within Sports Economics, I find myself dealing with Competitive Balance (or Uncertainty of Outcome) a lot. So it’s hopefully no surprise that it sneaks into today’s short post. If it is, I hope it’s a good surprise.

One of the salient topics within Sports Economics is the measurement of balance in a league. Balance is generally thought to be a result of the distribution of talent in a sports league. To evaluate league policies we, need to have good measures of balance, otherwise we might end up with the wrong conclusions about policy effectiveness. However, it turns out that measuring balance isn’t quite as easy as one might think.

There are challenges that come from changes in the number of games played within a season, unbalanced schedules, changes in the number of teams in a league, rule changes, and so on. There are also challenges in defining balance, and we’ve decided as a field that there should be more than one way to think about what balance or uncertainty means to a sports fan (is it game odds? playoff odds? dynasties?). Economists have come up with a number of different ways to try and account for these issues, often arising from measures in the industrial organization literature that evaluate competition and dispersion of output of firms. However, many of them are unsatisfying, and I am going to argue that one other issue to be dealt with are the run scoring environment and strategic behavior.

One of the issues with measuring balance is that there’s simply uncertainty over outcomes because of general randomness. The 2007-08 Super Bowl is a good example of this: you would be hard pressed to convince me that the Giants were a better team than the 19-0 Patriots. We see the same thing happen in the NCAA March Madness tournament. Now, I’m not here to argue about who is the best team, but we need to think about these examples and simply recognize that any measure of balance is going to contain this uncertainty. We are trying to measure talent distributions with noisy outcomes.

A second issue here is knowing that any game outcome is a ranking of discrete results. So, while the Red Sox scored on average 3.91 runs last year, they can’t score that actual amount in any game. They’ll score either 0, or 1, or 2, or 3, or 4, or 5, and so on. Given these discrete outcomes, as the run scoring environment decreases, each run scored is that much more valuable. Ultimately, in an extremely low scoring environment, the randomness of outcomes could overtake some of the differences in talent across teams in relatively small samples as the variance in scoring exceeds the signal from the talent distribution. This balance between randomness and signal is an important consideration in tournament theory (which I’ll leave aside for a statistical programming blog).

**So this brings me to my question**: If the run scoring environment changes, and the variance in runs across games changes along with it, will this impact the apparent league balance even if we hold a **known** talent distribution constant? If so, does this increase the importance of strategic choices in deciding game outcomes?

This is nothing more than a cursory look at these ideas, so go easy on me.

For our purposes, we’ll need to make some strong assumptions. First, we’ll assume that all teams have average pitching. That way, we can abstract just to the batters’ relative performance in a game. Second, and incorrectly, I am going to assume that runs are scored according to a Poisson distribution. In reality, runs are scored according to a negative binomial distribution (or some variation of this). I am doing this largely to simplify our exercise. But my plan is to move on to something more realistic in the future. Lastly, we’re going to assume a two-team league. This again simplifies our outcome, and we’re interested simply in the winning percentages of the two teams as they play one another across a 162 game season. Again, we’re distorting reality here. It’s just a blog post and I’ve already talked for too long.

Let’s begin by estimating what happens in a high run scoring environment with two teams. What we’ll do is run a draw from a Poisson distribution for each team, match them together, and give a win to the team with more runs. Here, both teams will average 6 runs, and one thing to note about the Poisson distribution is that the mean and variance are equal. That means that as our average runs scored goes down, so does the variability in runs scored (we’ll deal with that later). We can generate random draws from this distribution using `rpois.`

We’ll also start with giving a half win for ties.

###start with equal teams in high run scoring environment set.seed(12345) Team1 <- rpois(162, 6) Team2 <- rpois(162, 6) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.537037 sum(matchups$T1win) [1] 87 mean(matchups$T2win) [1] 0.462963 sum(matchups$T2win) [1] 75 sum(matchups$ties) [1] 18

Now, this is just a single season. You can see the randomness at work already. We’ve given these two teams identical skill levels, and yet we have Team 1 with a 0.537 winning percentage, or 87 wins in a 162 game season, where we give a half win to each team for each tie. There were 18 ties here, which added 9 wins to each team’s total. That means, ignoring ties, Team 1 won 78 games and Team 2 won only 66 games. That’s a large disparity due to randomness alone. Let’s run things again, this time starting with a different seed and see what happens.

###use a new starting seed for random draws set.seed(17235) Team1 <- rpois(162, 6) Team2 <- rpois(162, 6) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.4753086 sum(matchups$T1win) [1] 77 mean(matchups$T2win) [1] 0.5246914 sum(matchups$T2win) [1] 85 sum(matchups$ties) [1] 12

You can see that, here, Team 2 came out on top. Perhaps we should instead just simulate lots of seasons. How about 100 seasons of 162 games each. We can just do this by replacing our “162” with “16200” in the `rpois`

function.

###now do 100 seasons with original seed set.seed(12345) Team1 <- rpois(16200, 6) Team2 <- rpois(16200, 6) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.5035185 sum(matchups$T1win)/100 [1] 81.57 mean(matchups$T2win) [1] 0.4964815 sum(matchups$T2win)/100 [1] 80.43 sum(matchups$ties)/100 [1] 18.86

Now we’re looking a bit closer here. Team 1 comes out on top, just barely, with an average of 81.57 wins per season. Team 2 is just below with 80.43 wins per season

We should expect to see similar aggregate results as we reduce the run environment, likely with significantly more ties than in the larger run environment. And here lies the issue: if ties increase in a low-run environment, and baseball doesn’t allow ties, then strategic maneuvers will become more important. Let’s check with an extreme low-scoring environment.

###reduce the run environment set.seed(12345) Team1 <- rpois(16200, 2) Team2 <- rpois(16200, 2) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.503858 sum(matchups$T1win)/100 [1] 81.625 mean(matchups$T2win) [1] 0.496142 sum(matchups$T2win)/100 [1] 80.375 sum(matchups$tie)/100 [1] 34.17

As you can see, the number of ties has nearly doubled in our very low scoring environment. There are now an average of 34 ties per season. Basically, the discrete outcomes from our silly simulation resulted in more games where teams will need to depend on a bullpen in high leverage situations. In this case, strategic choices and relative bullpen strength are going to have a larger impact on our season outcome.

So what happens when we have teams of different strengths? Let’s look at one high scoring and one low scoring example, where the two teams are separated by one run in hitter skill levels.

###high scoring, different strengths set.seed(12345) Team1 <- rpois(16200, 6) Team2 <- rpois(16200, 5) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.622037 sum(matchups$T1win)/100 [1] 100.77 mean(matchups$T2win) [1] 0.377963 sum(matchups$T2win)/100 [1] 61.23 sum(matchups$tie)/100 [1] 19.2

You can see that being a full run better even in a high scoring environment is a big skill difference. Interestingly, though, we end up with about the same number of ties that we had when the teams were equal. Now how about a low scoring environment?

###low scoring, different strengths set.seed(12345) Team1 <- rpois(16200, 3) Team2 <- rpois(16200, 2) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.6721914 sum(matchups$T1win)/100 [1] 108.895 mean(matchups$T2win) [1] 0.3278086 sum(matchups$T2win)/100 [1] 53.105 sum(matchups$tie)/100 [1] 27.09

So, this time we see an advantage for the better team with a one run difference in skill levels for the low scoring environment. Why might this happen? Well, a single run is worth more in this low scoring environment than the high scoring environment.

Of course, the big issue here, is that we haven’t kept the same proportion of relative strength to the high scoring environment (or, 5 divided by 6 multiplied by 3, which would give us our “bad” team). This relative size might matter for our outcomes, and we see the relative relationship hold to some extent for run scoring environments across baseball history (around a 35 and 45 percent difference between the best and worst teams). So let’s make things a bit more realistic.

###proportional differences set.seed(12345) Team1 <- rpois(16200, 3) Team2 <- rpois(16200, (5/6)*3) matchups <- data.frame(cbind(Team1, Team2)) matchups$T1win <- ifelse(matchups$Team1 > matchups$Team2, 1, 0) matchups$T1win <- ifelse(matchups$Team1 == matchups$Team2, .5, matchups$T1win) matchups$T2win <- ifelse(matchups$Team2 > matchups$Team1, 1, 0) matchups$T2win <- ifelse(matchups$Team2 == matchups$Team1, .5, matchups$T2win) matchups$ties <- ifelse(matchups$Team1 == matchups$Team2, 1, 0) mean(matchups$T1win) [1] 0.585 sum(matchups$T1win)/100 [1] 94.77 mean(matchups$T2win) [1] 0.415 sum(matchups$T2win)/100 [1] 67.23 sum(matchups$tie)/100 [1] 27.92

Here, we in fact see a benefit to the lower scoring team. As the relative difference in skill levels stays the same across run scoring environments, there is an **improvement** in our balance measure. There was a six win swing between the two teams in our examples.

Now, this **is** an extreme example. We could plot out the differences across a variety of run environments and relative skill levels to get a better idea of what is going on here with a full simulation. But the important thing to note is that while our relative skill levels are exactly the same, our measurement of balance has changed. That could lead us in the wrong direction when evaluating policy. It also could tell us that, with an extra 10 ties per year, strategic considerations become more important in our low scoring environment. Imagine capitalizing on those 10 ties. That’s the difference between 3rd or 4th place in your division, and getting home field advantage in the Divisional Series.

So what really brought me to think about this possibility? Well, the 2014 Kansas City Royals are a pretty boring team, with the 17th best Starting Pitching core based on WAR and scored only the 14th most runs in 2014 (9th in the AL alone). They also made the World Series. But they did so with things that were manageable strategically: they had the best Fielding team by far and the 5th best Bullpen based on WAR. They stole the most bases as well.

Could it be that the lower run scoring environment gave the Royals a slight advantage based on simply the randomness that arises from this lower scoring environment? Could they have taken advantage of this with team makeup by acting more strategically? Well, I don’t know. And we’ll need to look deeper to understand. But we’ve got a start here, with some useful tools for simulation in R.

Note that the `rpois`

function is just one example of random draws in R. There are other versions of this from many other distributions that could be used. While I think directly simulating entire games is more appropriate, the options in R do, include the negative binomial distribution with `rnbinom.`

So, take this for what it is (strong assumptions, incorrect distributions, two team leagues). I present this as nothing more than a thinking exercise in measurement. Do we consider strategic plans as part of the distribution of talent? Perhaps, but this is quite a different way to think about it than what is normally thought of in economics, where we want to analyze the distribution of player talent using these aggregate measures, and infer how this may have been affected by league policies. So it is important to think about what these really mean from a philosophical standpoint and be careful whenever we imply that a league is more (less) balanced now than it was even 15 years ago.