Run Expectancies for Six Eras of MLB Baseball

Introduction

Tom Tango recently displayed tables of Runs Expectancy values over six eras of MLB baseball from 1900 through the recent 2023 season. A few years ago, I introduced in this post a simple graphical method for summarizing a runs expectancy matrix. I thought it would interesting to use Tom’s tables together with my method to explore changes in runs expectancy over MLB history.

The Runs Expectancy Graph

Here’s a brief description of my graphical method. We have a runs expectancy table that gives the mean runs in the remainder of the inning over the 24 states of an inning defined by runners on base and number of outs. Here is an example of this table.

For example, when there are runners on 1st and 2nd with 0 outs, the table shows that, on average, there are 1.55 runs scored in the remainder of the inning. We define a Bases Score defined by

Bases Score = Sum(bases occupied) + I(# of Runners > 1)

The I() indicator function means that we add a one to the bases score if the number of runners exceeds one. The values of Bases Score range from 0 to 7 over the eight possible runner scenarios. I graph the runs expectancy as a function of the bases score, using different colors for the three possible outs value. Here is a snapshot of the graph using data from the 2019 season:

Since the pattern or runs for each value of outs is pretty linear, that motivates summarizing this table by fitted slopes and intercepts of the three lines. Here are the slopes and intercepts using 2019 season data.

##   Outs Intercept Slope
## 1    0      0.64  0.24
## 2    1      0.35  0.18
## 3    2      0.13  0.08

The first intercept value, 0.64, represents the mean runs scored at the beginning of an inning with no outs and no runners on base. The slope value, 0.24, gives the increase in runs scored for each unit increase in the bases score. So if there is a single and there is a runner on 1st (with no outs), the mean runs would increase to 0.64 + 0.24 = 0.88. Similarly, the slopes 0.18 and 0.08 represent the increase in mean runs for each unit change in bases scores for one and two outs, respectively.

Apply for the Six Eras

I apply this method individually to the runs expectancy tables that Tom provides for the six MLB eras 1900-1920, 1921-1946, 1947-1968, 1969-1992, 1993-2009, and 2010-2023. The runs expectancy graphs are displayed below. The obvious takeaway is that the patterns in these graphs look very similar. across eras. Of course, runs scored per game has gone through significant changes over MLB history, but the relationship of runs scored in the remainder of the inning and the inning states appears similar over these baseball eras.

Focusing on the Slopes

To look deeper, I look at the fitted slopes in these graphs. For each baseball era, there will be three fitted slopes showing how the mean runs will increase for a unit change in the bases score. I’ve collected these slopes — the following graphs show, for each outs value, how these slopes have changed over baseball eras.

This is more interesting. Generally these slopes tend to be smallest for the modern eras of baseball. There is a runs advantage to having more runners on base — all of these slopes are positive. But the magnitude of this runs advantage tends to be smaller in the period 1947-2023 than it was in the period 1900-1946. Currently teams tend to produce most of their runs through home runs and fewer runs are produced through so-called “small ball” tactics (advancing runs through sacrifices, stealing bases, etc.). So that might explain why the runs advantage of extra runners on base is smaller now than it was in the early years of MLB baseball.

Related Topics

  • The runs expectancy table is a fundamental idea in sabermetrics and many measures of performance are based on this table. Chapter 5 of the 3rd edition of our book describes the process of computing this matrix from Retrosheet data.
  • One application of runs expectancy is measuring the runs value of hits such as home runs. Tom presents tables showing how the value of a home run depends on the situation (runners on base and outs) and also on the baseball era.
  • I’ve written a number of posts on different applications of runs expectancy. On this page, I collect some of the posts that I’ve written on runs expectancy.

Adjustments to Line Fit

Tom Tango suggested several modifications to my line fits to the runs expectancy data. I will illustrate these modifications for the (bases score, runs) data for the modern era 2010-2023.

Adjustment 1: The (bases score, runs) = (0, 0.5) point is special since 0.5 corresponding to the mean runs scored in a half-inning in the modern era. It is desirable that my fit goes through the (0, 0.5) point. We can accomplish this using the lm() function with the offset() argument. Essentially this forces the line to go through that point.

fit2 <- lm(Runs ~ offset(rep(.5, 8)) + 0 + Score, 
        data = d10)

Adjustment 2: Also since there are many more opportunities when, say the bases are empty than when there are runners on base, it is desirable to weight the fit by the opportunities or frequencies at each inning state. Tom has a table of the opportunities for all situations — we implement a weighted least squares fit where weights is given by the Opp variable.

fit3 <- lm(Runs ~ offset(rep(.5, 8)) + 0 + Score,
         weights = Opp, data = d10)

To see the impact of these adjustments on the fits, I display the basic fit and the adjusted fit below. The slope of the adjusted fit is larger than the slope of the basic fit.

Thanks, Tom for the suggestions and I will at some point update my work with these modified fits.

Leave a comment