NL Cy Young Voting, ERA and FIP Measures
2022 NL Cy Young Voting
Last week, Sandy Alcantara won the NL Cy Young Award for the best pitcher in the National League for the 2022 season. It was a unanimous decision as he received all 30 first place votes from the baseball writers. The voting system rewards seven points for a first place vote, four points for second place, three points for third place, two votes for fourth place and one point for fifth place. With this system, the top five pitchers in the voting (MLB article) were Sandy Alcantara, Max Fried, Julio Urías, Aaron Nola and Zac Gallen. What pitching measures were relevant in this ranking of pitchers? Let’s look at some measures (W, L, IP, ERA, FIP, WAR) from the FanGraphs leaderboard.
Name W L IP ERA FIP WAR <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Aaron Nola 11 13 205 3.25 2.58 6.3 2 Carlos Rodon 14 8 178 2.88 2.25 6.2 3 Sandy Alcantara 14 9 228. 2.28 2.99 5.7 4 Max Fried 14 7 185. 2.48 2.7 5 5 Corbin Burnes 12 8 202 2.94 3.14 4.6 6 Zac Gallen 12 4 184 2.54 3.05 4.3 7 Yu Darvish 16 8 194. 3.1 3.31 4.2 8 Logan Webb 15 9 192. 2.9 3.04 4.2 9 Tyler Anderson 15 5 178. 2.57 3.31 4 10 Jose Quintana 6 7 165. 2.93 2.99 4 11 Joe Musgrove 10 7 181 2.93 3.59 3.5 12 Merrill Kelly 13 8 200. 3.37 3.65 3.3 13 Julio Urias 17 7 175 2.16 3.71 3.2
I believe most voters looked primarily at a pitcher’s ERA in their decision-making. Julio Urias’ 2.16 and Sandy Alcantara’s 2.28 are the leading ERAs in this group, but Urias only pitched 175 innings compared to 228 for Alcantara. I think Alcantara’s unanimous decision for the Cy Young was likely due to his low ERA for the large number of innings pitched.
But there is an issue with the use of ERA in measuring pitching performance. The number of earned runs allowed is really a function both of the pitcher and the team’s defense. There has been a recent effort among sabermetricians to construct alternative pitching measures focusing on outcomes like walks, strikeouts and home runs allowed to remove the effect of the defense. This has led to the use of the FIP (fielding independent pitching) measure defined in the FanGraphs sabermetrics library
What the advantages of FIP over traditional pitching measures like ERA? The general motivation between FIP is the observation that pitchers really have little control over the results of balls put into play. So we remove those balls in play from the measure and focus on the outcomes that the pitcher has complete control like strikeouts, walks and home runs allowed. It is generally believed that FIP is a more stable measure than ERA and actually can be a better predictor of next season’s ERA than the current season ERA.
Using the FIP measure, the top two pitchers in our group were Carlos Rodon (2.25) and Aaron Nola (2.58). But Nola was only ranked 4th in the Cy Young voting and Rodon was 6st. Note also that Nola and Rodon were ranked first and second with respect to the WAR measure in the FanGraphs leaderboard.
In this post, I will provide evidence to show that FIP is indeed a more stable or consistent measure than ERA.
Aaron Nola’s Performance Across Seasons
We begin by graphing Nola’s ERA and FIP measures over the eight seasons that he has pitched in the Major League. Looking across seasons, Nola’s ERA and FIP values generally average about 3.3-3.5. But it is pretty clear that Nola’s FIP measures are more stable than his ERA values across seasons. A quick calculation gives that the standard deviation of his FIP values is 0.499 compared with a standard deviation of 0.778 for his ERA values. It is interesting that Nola’s ERA was much higher than his FIP in the 2016 and 2021 seasons — this might be attributed to the Phillies poor defense during these particular seasons.
Pitching Career Trajectories of FIP and ERA
If FIP is indeed a more stable measure of pitching performance than ERA, one would think this would be evident if one looked at the careers of some of the great pitchers in MLB history. Generally, pitchers have a standard shape of their career trajectory — a pitcher will improve through mid career and then decrease in performance until retirement. One can represent his trajectory by a least-squares quadratic fit — we look at these trajectories from using ERA and FIP as performance measures. By looking at the deviations (residuals) from the fit, we can see if the residuals from a fit using FIP are indeed smaller (that is, more stable) than the residuals from a fit using ERA.
I focus on all pitchers in MLB history whose midyear was between 1970 and 1990 and pitched at least 3000 innings — there were 29 pitchers satisfying these criteria. Below I display each pitcher’s ERA as a function of his age with smoothing quadratic fits on top. The shaded regions show the standard errors of the fit for each trajectory. I repeat this exercise using the FIP measure.
In the Aaron Nola example, I measured stability by looking at the standard deviation of the season ERA or season FIP values and saw that the FIP values had the smaller standard deviation. When we are fitting a pitcher’s season-to-season measures by use of a quadratic curve, one can measure stability by use of the residual standard deviation . (Note that is the estimate of in the model .)
For each of the 29 pitchers, I collect the residual standard deviations for the quadratic fits to the season ERA and FIT values. I show a scatterplot of the ERA standard deviations and the FIP standard deviations with a line y = x drawn on top. Most of the points fall below the line, indicating that the residual standard deviation tends to be smaller for the fit to the FIPs than for the fit to the ERAs. In other words, the season FIP values for a particular pitcher tend to be more stable across seasons than the corresponding season ERA values.
- Stability. We have provided evidence in this post that the FIP measure is more stable than ERA in the sense that a pitcher’s FIP will be more consistent across seasons than ERA.
- Is FIP a better measure than ERA? The FIP measure is only a function of SO, BB, HBP and HR. So FIP would not include, say hard-hit balls that lead to base hits and runs, so it is somewhat incomplete. The ERA measure focuses on runs allowed, but this depends on both the pitcher and the defense. The FIP reflects only outcomes affected by the pitcher.
- Improving ERA. It would be interesting to develop a runs measure that adjusts for the defense. Would it possible to compute the earned runs allowed per 9 innings with an average defense? Maybe this has already been done.
- Data? All of this work has been done using a season by season dataset extracted from the FanGraphs site.