2022 is Tough on Fly Ball Hitters

Introduction

It has been well documented that home run hitting is down in the 2022 season. Generally, if you look at season-to-season statistics reported in Baseball-Reference, a number of offensive measures are down this season.

• Teams are only scoring 4.09 runs per game this season compared to 4.53 in 2021.
• Teams are averaging 0.92 home runs per game compared to an average of 1.22 in 2021.
• The overall slugging percentage in 2022 is .373 compared to .411 in 2021.
• The overall 2022 batting average is .233 which is even lower than the .237 AVG in the pitching-dominated 1968 season.

Zach Crizer of Yahoo Sports had an interesting article on the new dead-ball era this week, asking how the dead-ball is hurting baseball offense. One figure in this article grabbed my attention — this figure plots slugging percentage as a function of launch angle, showing that SLG is drastically lower in 2022 for batted balls with launch angles between 26 and 30 degrees.

In this post, I am going to provide a statistical perspective on the dearth of offense in the 2022 season. I will fit an ordinal model to the in-play outcomes using 2021 season data. This will allow me to estimate the expected wOBA weight for any values of the launch variables. Then I will look at the 2022 season and compare the observed 2022 wOBA units with the predicted wOBA units based on the 2021 season model. By looking at the differences between the two seasons, we will see what types of batted balls are most penalized by the current dead-ball scenario.

Fitting an Ordinal Model to 2021 In-Play Outcomes

Looking at 2021 data, I focus on batted balls where the launch angle is between 0 and 50 degrees and the launch speed is between 90 and 115 mph. The outcome is the ordinal variable out, single, double, triple, home run that I code by the integers 1, 2, 3, 4, 5. I fit the ordinal logistic GAM model

logit(P(outcome >= j) = a_j + s(launch_angle, launch_speed)

where s() is a smooth function of the launch variables. Using this fitted model I can estimate the probabilities of all five outcomes for any values of launch velocity and launch speed. Using these probabilities, I can estimate the wOBA numerator by using the wOBA weights:

wOBA numerator = 0.9 P(single) +1.25 P(double) + 1.6 P(triple) + 2 P(home run)

Predicting 2022 wOBA

For the 2022 data, again we focus on the batted balls where 0 < launch_angle < 50 degrees and 90 < launch_speed < 115 mph. We bin the launch variables into 5 x 5 = 25 bins — for each bin, we compute

• the sum of observed wOBA weights for the 2022 batted balls in the bin
• the sum of expected wOBA weights using the GAM model on the 2021 season data
• the change, the increase in the observed 2022 weights over the expected 2021 season weights

Here is a sample of the output. For example, looking at the first row, there were 347 batted balls in the first bin where the launch angle is between 0 and 10 degrees and the launch speed is between 90 and 95 mph. The sum of wOBA weights in that bin was 155 which we compare with the expected sum of wOBA weights of 168 from the model on the 2021 season data. Here the change is -12.8 which indicates a drop in the 2022 season.

```  LA     LS           IP  wOBA E_wOBA Change
<fct>  <fct>     <int> <dbl>  <dbl>  <dbl>
1 (0,10] (90,95]     347  155.  168.  -12.8
2 (0,10] (95,100]    472  252.  259.   -6.95
3 (0,10] (100,105]   568  320.  323.   -2.80
4 (0,10] (105,110]   354  228.  227.    1.08
5 (0,10] (110,115]    73   44    49.3  -5.25
```

Looking at Changes – Z-Scores

Since the outcome is a weighted count of hits, a reasonable Z-score is

Z = (observed wOBA – expected wOBA) / sqrt(expected wOBA)

if this score is outside of, say (-3, 3), this indicates a significant change between the two seasons.

Plotting Z-Scores

Here I display the Z-scores across all bins in the launch space. We see ..

• Practically all the Z-scores are negative which is not surprising — offense is down in 2022 compared to 2021.
• The Z-scores for line drives (launch angles between 0 and 20 degrees) are relatively small which indicates that the wOBA measure for 2022 is similar to that for 2021 for these batted balls.
• I have highlighted the Z-scores smaller than -3. These all correspond to batted balls between 20 and 40 degrees. The change in the bin corresponding to a launch angle between (20, 30) and exit velocity between (100, 105) is most striking. The observe sum of wOBA weights is 481 which is much lower than the expected wOBA weight sum of 666 — the corresponding Z-score is -7.14.

Takeaways

• Offense is depressed in the 2022 season, but it appears to impact most batted balls that are fly balls with launch angles between 20 and 40 degrees. This is consistent with the conclusions of the Yahoo Sports article.
• As Alan Nathan would say, this is an expected outcome if increased drag is the characteristic of the 2022 baseball. Increased drag would have a bigger impact on fly balls then line drives or ground balls.
• As I am writing this, the Phillies improved their team in the 2022 season by the addition of new sluggers Nick Castellanos and Kyle Schwarber. If these particular hitters tend to hit fly balls, this dead-ball effect would impact these players more than line drive hitters such as Christian Yelich.

R Work

All of the R code for this work is available on my Github Gist site. There are two key functions. `data_work()` will retrieve Statcast in-play data for all Statcast seasons through 2022 from my Github site. The function `offense_loss()` will perform the GAM fitting, computation of the observed and expected wOBA weights, and graphics work using the `ggplot2` package.

Assuming functions `data_work()` and `offense_loss()` are in the current working directory, then the following script was used to predict offense for the 2022 season using the model on 2021 data.

```source("data_work.R")
source("offense_loss.R")
sc_ip <- data_work()
LA <- c(0, 50, 10)
LS <- c(90, 115, 5)
season1 <- 2021
season2 <- 2022
out <- offense_loss(sc_ip, season1, season2, LA, LS)
```

The output out is a list with two components — `S` is a data frame with the computations for all bins, and `the_plot` is the `ggplot2` graphic shown above.

It would be interesting to rerun the `offense_loss()` function for other choices of `season1` and `season2`.

Related Posts

• I describe the decrease in 2022 home run hitting in this recent post.
• I illustrate using a GAM ordinal model in this post where relate the barrel region with the space where the expected wOBA exceeds 0.8.