Monthly Archives: September, 2020

Is Pitching 75% of Baseball?

Introduction

Baseball legend Connie Mack supposedly said that pitching was 75% of baseball. I’m not sure how Mack arrived at the number 75%, but a general problem of interest is to understand the batter/pitcher dynamics. When one looks at a plate appearance, how much of the outcome is determined by the hitter and how much is determined by the pitcher?

Noah Thurm recently wrote an interesting article on FanGraphs on understanding the attributes of successful pitchers from Statcast data. Here’s the opening three sentences of this article:

“Successful pitchers limit damage by minimizing the quality of contact they allow. How they can best do that remains up for debate, as pitchers tend to focus on some combination of deception, movement, and location to try and miss barrels. I propose that the most important pitcher-influenced variable to quality of contact is Launch Angle, and understanding and influencing it ought to be a priority for all pitchers.”

He provides evidence in this article that pitchers have relative little control over the exit velocity of batted balls — exit velocity is controlled by the hitter. But the pitcher has more control than the hitter on the launch angle and the better pitchers are the ones that can successfully influence this measurement.

Thurm’s use of the phrase “most important pitcher-influenced variable” raises a general question. Given a particular outcome of a plate appearance or a batted ball or a pitch, how much of the variability of that outcome is due to the batter and how much to the pitcher? Here I will describe how to answer this question using a basic statistical model for separating out variation of a particular response. This will allow us to compare the sizes of two sources of variability– one due to the pitcher and one due to the batter. Using this method, we will compare sources of variability across many outcomes of a plate appearance or a batted ball or a pitch.

A Non-Nested Random Effects Model

Let y denote some outcome of a plate appearance. If y is binary (0 or 1) with probability of success p then a “non-nested” random effects model has the form

\log \frac{p_i}{1-p_i} = \mu + \beta_i + \gamma_i

where \beta_i is the effect of the batter and \gamma_i is the effect of the pitcher. We assume that the batter effects \beta_1, ..., \beta_I come from a normal distribution with mean 0 and standard deviation \sigma_B, and the pitcher effects \gamma_1, ..., \gamma_J are a sample from a normal distribution with mean 0 and standard deviation \sigma_P.

When we fit this model to data, we focus on the estimates of \sigma_B and \sigma_P which tells us how much variability of the outcomes is due respectively to the batters and to the pitchers. We can compare these two standard deviation estimates by computing the ratio R = \sigma_B / \sigma_P. If R > 1, this tells us that the batters have more control over the different outcomes — if instead R < 1, then the pitchers have more control over the outcome variation.

Here’s an example of quick fitting this model on R using the glmer() function in the lme4 package. We have the Retrosheet play-by-play files for the 2019 season, SO is an indicator of a strikeout and BAT_ID and PIT_ID are respectively codes for the batter and pitcher. I use the VarCorr() function to extract the standard deviation estimates and compute the ratio of standard deviations R.

fit <- glmer(SO ~ (1 | PIT_ID) + (1 | BAT_ID),
                 data = d,
                 family = binomial)

(sds <- VarCorr(fit))
Groups Name        Std.Dev.
 BAT_ID (Intercept) 0.5016  
 PIT_ID (Intercept) 0.3118

(R <- 0.5016 / 0.3118)
[1] 1.608724

In the old days of baseball, the pitchers were supposed to throw underhand — essentially they were just putting the ball in play. In this scenario, one would think that the batters controlled the variability in the outcomes of plate appearances. If we fit this model to batting outcomes from this old-time era, we should find that R > 1. Of course, baseball has changed since those early years and pitchers are now trying to get the batters out, so it is less clear if the batters or the pitchers are currently contributing more to the variability of the PA outcome.

Outcomes of a Plate Appearance

Using Retrosheet play-by-play from the 2019 season, I considered the following outcomes of a plate appearance:

  • Strikeout (SO)
  • Walk (non-intentional) (BB)
  • Hit by pitch (HBP)
  • Home run (HR)
  • Hit on ball in-play (HIP)
  • Error (E)

For each model, I used the glmer() function from the lme4 package to fit this random effects model. From the standard deviation estimates, I computed the ratio of estimated standard deviations R.

Let’s think aloud what we might expect to find from these different PA outcomes. Hitting home runs would seem to depend more on the hitter than the pitcher, so I would expect R > 1. It is less clear for the other outcomes. Thinking about HBP, does a pitcher hit a batter with an errant pitch or does the batter position himself about the plate so he is more likely to get hit? Certain players like Ron Hunt and Chase Utley were noted for high HBP rates. What about an error? Here a fielding error would seem to depend on the defense and the batter and the pitcher would have little control over the likelihood of an error. So for the error outcome, I would anticipate that R = 1.

Outcomes of a Pitch or Batted Ball

A similar type of analysis can be done for outcomes of a pitch or a batted ball. We’ll consider three outcomes:

  • Miss on a swing (Miss)
  • Exit velocity of a batted ball
  • Launch angle of a batted ball

(By the way, since both exit velocity and launch angle are continuous measurements, I will use a different random effects model where the response is normally distributed, but I’ll still be able to get standard deviation estimates and a value of R.)

Remember Thurm in his FanGraphs article stated that a pitcher has much more control over launch angle than the exit velocity. So I would anticipate the value of R to be greater for exit velocity than for launch angle.

Results

I fit these models for the six PA outcomes and the three pitch outcomes described above. The figure below graphs and labels the values of the ratio R. Remember small values of R are where more variability of the outcome is contributed by pitchers and large values R correspond to situations where more variability is due to batters.

We see that …

  • As expected, the ratio is close to 1 for errors.
  • Of all outcomes, the value of R is smallest for launch angle which means that pitcher variability contributes most to the variability of launch angles.
  • Batters contribute more of the variability for launch speeds and home run rates. Since high launch speeds contribute to home runs, this is not surprising.
  • It is interesting that walks have a R value close to one which means that pitchers and batters contribute equal components of variability of walks. Walking rates depends on both batter plate discipline and inaccurate pitches thrown by the pitcher.

Takeaways

  • This statistical approach clearly distinguishes exit velocity from launch angle — the variability in exit velocity is pretty much about the batter, but the pitcher has a lot of impact on the variability in launch angle. This supports the main point of the FanGraphs article.
  • This analysis impacts how one constructs a leader board. If the variability in a particular outcome is primarily due to variation in the hitters, then one would make a top-10 list for the hitters, not the pitchers. Based on this analysis, perhaps we should make up a top-10 list for pitchers who have the smallest percentage of optimal launch speeds on balls put into play. (Maybe that is something I can talk about in a future post.)
  • I’ve blogged about these type of models before in the context of home run hitting. In this particular post, I illustrate several types of random effect models. Not only does this model provide estimates of the variability in pitchers and hitters, it provides random effects estimates for the player abilities that are useful in predicting future performance.