Monthly Archives: April, 2023

Cluster Luck Scoring in 2022 Season


Baseball fans are familiar with the clustering aspect of run scoring. To score runs, a team must cluster a number of on-base events during particular innings. A team won’t score runs if they get 9 singles where there is one single in each inning. Instead the team has to group or cluster these singles together to produce runs.

Related to this clustering issue, I addressed the following question in a June 2016 post.

“Suppose a team scores 10 runs in a game. I would like to measure the so called “cluster luck” of this run scoring. That is, how many of these 10 runs are attributed to the fact that the team was able to cluster their on-base events?”

In that June 2016 post, I proposed a simulation-based approach to addressing this question. I proposed a relatively simple model for simulating the runs scored in a half-inning. This model assumes that the different on-base events occur in a random fashion through the grame. Given a sequence of plate appearance events, I used tables of runner advancement probabilities to simulate runner advancement and runs scored. We can then compare the actual runs scored in a game by each team with the expected number of runs scored from the model. If the runs scored exceeds the expected number, this indicates that the team effectively clustered their on-base events to score runs. On the other hand, if the runs scored is smaller than the expected number, this indicates that the team spread out their on-base events through the nine innings and didn’t cluster the hits and walks to score runs.

In this post, we’ll apply this method across all teams and games in the 2022 season. There are several questions of interest.

  • How good is my run scoring model? Specifically, does it predict accurately the actual runs scored in a game?
  • Are there teams that tend to score more or less runs than predicted from the model?
  • Can we pick out interesting outliers? That is, can we identify teams that scored many more runs than one would predict solely on the number of on-base events? Also we want to identify teams that scored far fewer runs than predicted from the model.

Simulating a Half-Inning of Baseball

I recently attended a Phillies/Reds game on April 16, 2023 where I observed the following PA events for the Phillies:

 [1] "HR"  "1B"  "BB"  "BB"  "1B"  "OUT" "1B"  "2B"  "1B"  "OUT" "1B" 
[12] "2B"  "OUT" "OUT" "2B"  "OUT" "OUT" "OUT" "OUT" "1B"  "BB"  "1B" 
[23] "1B"  "OUT" "2B"  "1B"  "BB"  "OUT" "BB"  "OUT" "OUT" "2B"  "OUT"
[34] "1B"  "OUT" "OUT" "OUT" "BB"  "OUT" "OUT" "OUT" "HR"  "OUT" "OUT"
[45] "OUT" "1B"  "OUT" "1B"  "OUT" "1B"  "1B"  "OUT" "OUT" "1B"  "1B" 
[56] "OUT"

The Phillies scored 14 runs in this game. The question is: how well did the Phillies cluster these on-base events in this particular game? I simulated the runs scored using the following model:

  • First, I randomly permuted there 56 plate appearance outcomes. (The model assumes that all of these possible permutation of outcomes are equally likely.)
  • I divided these random PA’s into 9 innings and simulated the runs scored in each inning using runner advancement probability tables.
  • I repeated this process 100 times, collecting the runs scored in all 100 simulations.

I described this run scoring simulation algorithm in a 2016 post.

From the model that assumes no real clustering of events, the mean runs scored was 14.77. The Phillies actually scored 14 runs in this particular game which is 0.77 lower than expected from my model. So I didn’t see any unusual clustering of on-base events in this game.

Games in the 2022 Season – General Patterns

I collected the Retrosheet play-by-play files for the 2022 season. For each game, I collected the sequence of plate appearance outcomes (1B, 2B, 3B, HR, BB or HBP, OUT) and used my run scoring model to simulate the run scored assuming no real clustering of events. For each team, I compute the runs scored minus the expected runs scored from the simulations using my model.

Below I display a graph of the residuals (Runs Minus Expected Runs) for all games plotted against the batting team. I overlay a vertical line at zero.

Generally, it is clear that my run scoring model underestimates the actual number of runs scored in these games. One can compute the the mean residual is 0.82 which means that there are, on average, 0.82 more runs scored than predicted from the model. My model is just based on the hit and walk/hbp events, ignoring runner advancements due to errors, wild pitches and sacrifices, so it is not surprising that the model would underestimate the observed runs. Also I would suspect some natural clustering of the on-base outcomes since one may be facing an unusually weak pitcher or strong pitcher in particular innings.

I don’t see any team effect in this graph. That is, I don’t see teams that are especially good or especially weak in clustering runs. But then again, perhaps this particular graph is not that helpful in picking out these unusual teams.

Unusual Run-Scoring Games

This graph does show some particular games where the Runs – Expected value is unusually high positive or high negative. These are the games where the team scored significantly more or fewer runs that one would predict based on the “random” run scoring model.

Here are the top 5 games where the team scored many more runs than expected. On July 12, Oakland scored 14 runs from their 19 on-base events, but they were predicted from the “random” model to score only 4.97.

       Game_Id Team Runs Expected Outs On_Base X_Base Residual
1 TEX202207120  OAK   14     4.97   36      19      4     9.03
2 CHN202204230  CHN   21    13.51   24      28      5     7.49
3 SLN202210010  SLN   13     5.88   24      18      4     7.12
4 WAS202204200  ARI   11     3.98   27      14      3     7.02
5 CLE202204201  CLE   11     3.99   24      14      1     7.01

Let’s focus on the leader — Oakland scoring 14 runs when the expected runs was only 4.97. Looking at the box-score of the Texas/Oakland game on July 12 on Baseball Reference, this was an interesting scoring game for Oakland. In this game, Oakland used 7 on-base plays including a home run at the end to score 8 runs in the top of the 12th inning.

Here is the graph at the low end — the five largest negative residuals. Looking at the first line, on June 23, the White Sox were shut out, but were expected to score 3.11 runs from the random model on their 13 on-base events.

          Game_Id Team Runs Expected Outs On_Base X_Base Residual
4700 CHA202206230  CHA    0     3.11   27      13      4    -3.11
4701 ANA202209070  DET    5     8.28   27      16      7    -3.28
4702 MIA202208120  MIA    3     6.47   27      18      5    -3.47
4703 DET202205130  BAL    2     5.48   27      17      2    -3.48
4704 SFN202207290  SFN    2     6.11   27      16      6    -4.11


  • (General Approach). This post illustrates a general method of understanding variation in baseball outcomes. One proposes a simple probability model for generating outcomes, here the number of runs scored in a game. Then one uses a simulation to predict outcomes based on the model, and checks how closely the model predictions compare to the actual observed outcomes.
  • (Related Posts). This is actually part 3 of a three-part series on cluster luck scoring. Part 1 describes the run-scoring algorithm, Part 2 describes the general method of using this algorithm to measure cluster luck scoring, and this post extends this idea to explore the clustering of runs scored for all games in the 2022 season.
  • (Other Methods) Certainly there could be other methods of assessing the so-called clutch luck scoring in baseball. This particular model appears to underestimate the actual game runs. But I think these predictions provide a reasonable baseline in identifying games where the team is unusually effective in producing runs.
  • (A Better Model?) One could improve this model by incorporating other runner advancement possibilities such as sacrifice, wild pitches and stolen bases. But as said before, this relatively simple model seems to be helpful in identifying the clutch luck scoring games.
  • (R Code) All of my R code for this work is available on my Github Gist site. There are two main functions — the function generate_runs() takes a vector of observed PA outcomes as input and outputs the expected runs scored across 100 iterations. The function retro_game_work() takes some Retrosheet data and a game id of interest, extracts the plays for each team, and runs the generate_runs() function simulation separately for each team. You will need some Retrosheet play by play data for a particular season to use this function.