A New “Swing and Miss” Stat

Watching the 2015 playoffs, one has to be impressed (from the pitcher’s perspective) with the number of pitches that are swung and missed. That raises the question: what is a reasonable way to measure a pitcher’s ability to induce a miss on a batting swing? One could simply report the fraction of swings that are missed — for example, the TBS announcer reported that batters missed 31% of Clayton Kershaw pitches which seems like a large percentage. But that raises several questions

  • Any given pitcher has specific pitches (I’m thinking of off-speed pitches) that are more likely to result in a swing and miss. So maybe we should focus on the misses on particular types of pitches.
  • Also it would seem that a batter’s propensity to miss a pitch would depend on the pitch’s location. Many of the misses that I saw in last night’s game between the Mets and Dodgers were sliders and changeups that were thrown near the ground.

This brief study will attempt to add some extra insight to the basic “fraction of missing pitch with the swing” statistic and compare four of the great playoff starters using this statistic.

  1. From the pitchFX data, I collect the pitcher name, the pitch type, the destination of the pitch (variable des) and the pitch location (variables px and pz).
  2. For a particular pitcher and particular pitch, I restrict attention to only the batter swings. For this filtered data, I record two new variables:

    • Outcome = 1 if the batter misses the pitch, and Outcome = 0 otherwise
    • Distance — the distance of the pitch from the center of the strike zone
  3. For this data (specific pitcher and pitch), I fit the logistic model
    p = \frac{\exp(a + b \times DISTANCE)}{1 + \exp(a + b \times DISTANCE)}
    Here p is the probability the batter misses the pitch, and the variables a and b give the intercept and slope of the logistic fit.

I fit this logistic model to four playoff pitchers Clayton Kershaw, Zack Greinke, Jake Arrieta, and Dallas Keuchel using a large group of 2015 pitchFX data. For a fair comparison, I am looking at only sliders thrown by the four pitchers. The following table shows (for the sliders) the number of swings, the number of misses, and estimates of the quantities a and b from the logistic model.

            Clayton Kershaw Zack Greinke Jake Arrieta Dallas Keuchel
N_Swings             415.00       288.00       476.00         256.00
N_Misses             186.00       103.00       116.00          97.00
(Intercept)           -2.72        -3.77        -3.08          -4.80
Distance               2.45         2.78         2.13           3.04

In the figure below, I summarize these four logistic fits by plotting the probability of a miss as a function of the distance from the center of the strike zone.


Here’s how to interpret the logistic fit for a specific pitcher.

  1. If we plug in the value DISTANCE = 0 into our logistic fit, we get an estimate of the logit of the probability a batter misses a slider thrown in the middle of the plate. One has to convert the logit to a probability. For Kershaw, the intercept estimate is -2.72. So the probability a batter misses a slider in the middle of the plate is p = exp(-2.72) / (1 + exp(-2.72)) = 0.06. This is a poorly located slider — batters tend not to miss these pitches.
  2. It is more helpful to look at the fits for pitches thrown away from the middle of the plate. For example, if a pitch is a foot away from the center, DISTANCE = 1, and the probability of missing is estimated by exp(-2.72 + 2.45) / (1 + exp(-2.72 + 2.45)) = .43.
  3. The value of the slope tells us the importance of pitch location. For Kershaw, the slope estimate is 2.45. There is a useful “divide by four” interpretation for a slope in a logistic model. For every additional foot the pitch is from the center of the player, the probability the batter misses the pitch is increased by the “slope divided by 4” or 2.45 / 4 = 0.61. This is a large change in probability, emphasizing the importance of throwing the pitch in a good location.

Comparing the four pitchers …

  1. Clearly Kershaw has the best slider — for any distance from the middle of the plate, the batter is more likely to miss the pitch. Also it is clear that Kuechel has the worst slider from the viewpoint of getting the batter to miss the swing.
  2. There is an interesting comparison between Greinke and Arrieta. The two pitchers are similar with respect to pitches thrown within a foot of the center of the strike zone. But Greinke appears to be better in getting the batter to miss the pitches thrown, say two feet away from the middle of the plate. (Grinch’s “slope” is greater than Arrieta’s “slope”.)

As I am sure someone has said, the three most important aspects of pitching are location, location, and location, and this brief analysis illustrates this truth for pitching. There is a lot more that can be said using this new pitching measure and I have a group of Bowling Green State University students in the ACTION program exploring swing-and-missing in various ways.

One response

  1. i’m talking about daniel murphy’s homers too @ http://www.baseballblogguy.com ! feedback appreciated

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: