Monthly Archives: January, 2023

Ability and Luck in FanGraphs Pitching Statistics


If one looks at the Pitcher Leaders section of the FanGraphs site, one sees a remarkable number of pitching measures. There are the familiar ones in the “Dashboard” such as the strikeouts, walks and home runs allowed per nine innings. Under the “Plate Discipline” tab, one sees various percentage rates of swing and contact in and out of the zone, the “Pitch Value” tab gives summary pitch values for various pitch types, and the “Batted Ball” tab describes percentages of batted ball types (ground ball, line drive, fly balls and popups), direction, and hardness of contact.

All of these pitching performance measures are interesting on face value, but it is unclear about the usefulness of these measures in understanding pitcher abilities. That is, what particular measures are helpful in predicting performance in future seasons? This question is relevant in this hot stove season when teams are deciding which free agents to add to their rosters.

Jay Bennett and I talked about the general issue of distinguishing performance and ability in our book Curve Ball. This topic was especially relevant in the Situational Statistics chapter. We were exploring the situational batting measures of Scott Rolen (newly inducted this year in the Hall of Fame) and tried to make sense of how Rolen hit at home vs away games, hit against pitchers of the same or opposite arms, hit on different counts, and so on.

In this situational stats chapter we distinguished between situations that

  • have no “real” effect (the player abilities are the same in either situation)
  • represent a bias (there could be an advantage playing at home but the ability advantage is the same across all hitters)
  • represent abilities (here batters have different abilities to take advantage of the particular situation)

If a situational measure, say the difference between home and away performance, is ability-driven, then it will be predictive of future performance. Other situational measures are more luck driven, or due to other variables besides the batter abilities. One way of checking the ability characteristic of a measure is to construct a scatterplot of the situational player measures for two consecutive seasons. If there is a significant positive association in this graph, the situation is indeed ability driven. We used this method to demonstrate in Curve Ball that home/away effects for one season are not predictive of home/away effects for the following season.

In this post, I illustrate the use of a Shiny app to explore scatterplots of pitching measures for consecutive seasons. By repeating and summarizing the relationships over many pairs of seasons, we get a sense of the “ability/luck” aspects of different measures. Then I will illustrate the use of a random effects model for several pitching measures to confirm the findings.

The Dashboard Pitching Stats

I begin by collecting FanGraphs pitching leaders data, from the default Dashboard, for seasons 2013 through 2022. I only include pitchers who pitched at least 100 innings for a given season. One nice feature of the FanGraphs site is that one can separate out the season information in the download.

Here is a snapshot of a Shiny app constructing a scatterplot of a particular pitching measure for the 2021 and 2022 seasons. One chooses the measure of interest (among the FanGraphs measures) on the left. Besides the scatterplot, the value of the correlation is displayed. Also in the app one can brush the scatterplot to see the identities of interesting points that are extreme or deviate from the general pattern. Here we see that BB/9 is an ability measure since the correlation between values for the 2021 and 2022 seasons is 0.61. By use of brushing, we highlight three pitchers (Eovaldi, Scherzer, and Kershaw) who had low walk rates both seasons.

We see from the below display that other measures are less-ability driven such as the home run rate HR/9 where the correlation between rates for the two seasons is only 0.28.

You can play with this particular Shiny app by going to the site

Many Pairs of Seasons

These correlation values will vary depending on the choice of consecutive season pairs. So we consider scatterplots over the multiple pairs of seasons 2013-2014, 2014-2015, …, 2021-2022. (We skip the short 2020 season.) Also we consider all 12 measures on the FanGraphs Dashboard for pitching leaders.

Below I display dotplots of the season-to-season correlation values for each of the 12 pitching measures, where the measures are arranged by the median correlation. The measures at the top with large correlations, such as fastball velocity (vFA), ground ball percentage (GB%), strikeout rate (K/9) are ability driven. For these measures, much of the variability is due to the pitchers. In contrast, measures with small correlations such as the home run rate among flyballs (HR/FB), runners left on base percentage (LOB%), and batting average on balls in play (BABIP) are less reflective of pitcher abilities. We know that pitchers have relatively little control over the outcome of an in-play event so it is not surprising that BABIP has low correlations.

Random Effects

Another way to distinguish the ability/luck characteristics of different measures is to fit a random effects model.

We describe the model for strikeout data. For the jth pitcher, we record the number of strikeouts y_j among n_j batters faced where we assume y_j is binomial(n_j, p_j). The strikeout probabilities of N pitchers p_1, ..., p_N follow a Beta distribution with unknown parameters \eta and K. The parameter \eta represents an average value and K is a precision that determines the spread of the probabilities

We fit this random effects model separately to the strikeout rates, the walk rates and the home run rates of the pitchers in the 2022 season.

The table gives the estimated values of \eta and K for each set of rates.

  Rate_Type   Eta       K
1 Strikeout 0.222  81.491
2      Walk 0.072 245.029
3  Home Run 0.029 902.315

What do we learn?

  • Strikeout rates reflect the abilities of the players. The small value of K indicates that pitchers have different true strikeout rates.
  • In contrast, the large value of K for home runs indicates that pitchers’ abilities to give up home runs are pretty similar.
  • The ability dimension of walks lies between the ability dimensions of strikeouts and home runs.

These random effects model fits agree with the season-to-season correlation values in ranking the different pitching measures on the ability/luck scale. In addition, using this fits it is straightforward to predict a pitcher’s rates (for strikeouts, walks or home runs) in the following season.

Closing Comments

  • Part of this post was motivated by the relative ease of obtaining the FanGraphs data. Any table can be downloaded as a csv file on the push of a button.
  • The use of the word “luck” may be misunderstood — readers might think that I am saying that the pitcher performances are lucky. When I say that pitcher home run rates are luck-driven, this just means that the variability in home run rates is due to other factors like the hitters or the team defense and not reflective of the pitcher abilities.
  • I’ve written a lot on the use of random effects models in baseball problems. See the section on Multilevel Modeling in my Baseball Research page for more examples from my blog posts.