Last week, I discussed the recent “hot hand” article published on fivethirtyeight.com by Arthur and Matthews (I’ll abbreviate their names by AM.) There were two main issues that I had with the article. First, I don’t think fastball velocity is a good proxy for pitcher performance, although I understand that there is some association of fastball velocity with other pitcher measures. Second, I don’t believe pitchers have “hot” and “cold” states — pitching can’t be conveniently classified into two states. (By the way, I remain unsure about the details of the hidden Markov model AM applied since there is no draft article available with a description of the actual model. Posting R code is not sufficient but posting a R Markdown file would be an improvement). Anyway, I thought it would be helpful to step back and illustrate an exploratory study of Cole Hamels’ 2015 season. This will give some background to AM’ work and suggest some other directions to study. (I’m a Phillies fan and Hamels was traded from the Phils to the Rangers in the 2015 season so that’s why I am considering this particular pitcher and season.)
Where’s the data?
baseballr package, it is easy to pick up the pitch-by-pitch data for Hamels for the 2015 season.
library(baseballr) hamels1 <- scrape_statcast_savant_pitcher("2017-03-25", "2017-10-5", 430935)
The data frame
hamels1 contains variables on the pitches thrown and also provides information on the pitch result and batting data (launch speed and angle) if the ball is put into play.
What pitches does Hamels throw?
AM only considered fastballs, but the graph below shows that Hamels actually throws five pitches, a changeup (CH), curve ball (CB), cutter (FC), four-seam fastball (FF), and a two-seam fastball (FT).
This graph displays parallel boxplots of the release speeds of these different pitch types. It appears that Hamels’ fastest pitch, on average is the two-seam fastball (FT), and we will focus on these.
How do the speeds of the four-seamers change across games?
Since AM are interested in how the two-seam fastball speeds vary across games, here is a box plot of the FT speeds across games. There are several interesting comments from viewing this graph: (1) speeds vary between games and also within games and (2) there are many outliers, mostly on the low end. (These could be measurement errors.)
Without actually checking the cause of the outliers, it seemed easiest to just remove them using John Tukey’s rule-of-thumb for detecting outliers. Here is a new graph of the pitch speeds with the outliers removed.
Fit a Random Effects Model
At this point it seemed helpful to fit a random effects model. Basically this model says that the speeds for a particular game, say j, are normal with mean mu_j and constant standard deviation sigma, and then the game means mu_1, …, mu_J are normally distributed with standard deviation sigma2. By fitting this model, one gets estimates at the standard deviations sigma and sigma2. By comparing these standard deviation estimates, one gains an understanding of how much of the total variation in pitch speeds is due to within games and how much of the variation is due to between games. For this “clean” data, it turns out that the estimates of sigma and sigma2 are similar in size — so about half of the total variation in Hamels’ fastball velocities is due to the variability between games.
Okay, I agree with AM that there is substantial variation in pitch games between games. But hot and cold states? There is little evidence of this based on this analysis for one pitcher for one season.
Good and bad games
Actually, a more interesting question is how Hamels performs in games where he did “well” and games when he pitched “poorly”. So I looked at the game logs from Baseball Reference — focusing on earned runs allowed, I say that Hamels had a good game if he allowed 3 or fewer ER and bad if he allowed 4 or more ER. (It is interesting to note that Hamels’ games were generally great or poor — there were few games in the middle.)
So the question is — how did the fastball speeds compare between Hamels’ good and bad games? I’ve graphed the median fastball (FT) speed for all games, coloring the point by overall outcome of the game (good or bad). Okay, I see two bad games in the middle of the season where his average fastball speed was low, but otherwise, the median fastball speed does not appear to be a good predictor of success.
Pitch distribution across games
For Hamels’ 2015 season, FT speed does not appear very good in understanding the variation between good and bad games. So I started looking at other variables. Here are bar plots of the frequencies of pitch types for each game, coloring each plot by the game result (good or bad). It is interesting that Hamels’ pitch selection varies across games. Specifically, look at the use of his two-seamers and four-seamers (the two right-most bars in each plot) — some games Hamels likes to use his two-seamer and other games he balances the use of two-seam and four-seam fastballs.
Let’s try an alternative measure
Thinking about better measures, let’s focus on the outcome of the pitch. Since both called strikes and swinging strikes are desirable, I graph the fraction of swinging and called strikes for each game, coloring the point by game outcome. This looks like this strike fraction is a reasonable indicator of success — for example the four games where this fraction was under 22% were all “bad” games. Generally, the fraction of strikes for good games is higher than for bad games, but there are obvious exceptions such as Hamels’ one bad game where the strike fraction exceeded 40%. (Of course, a Hamels’ bad game may be the result of only a few poorly placed pitches.)
Of course, this is a limited study — one pitcher for one season — but here is a summary with suggested directions for future study.
- Pitchers like Hamels use a variety of pitches and obviously one is missing a lot of information by focusing only on fastball speeds.
- There is evidence that there is substantial variation in fastball speeds of Hamels between and within games. After removing the outliers, we found that half of the total variation in Hamels’ FT speeds was due to variation between games. But I don’t see “hot” and “cold” — instead I see a general increase in Hamels’ pitch speeds during the 2015 season.
- There are more interesting outcomes than fastball speeds. One can easily divide Hamels’ games into “good” and “bad” ones and then the question is — what inputs help to explain why Hamels had games of the two types?
- To me, there are two key outcomes of a pitch — the pitch location and the batting outcome. Above we considered the fraction of called or swinging strikes which I think is more helpful that fastball speeds in understanding pitcher quality. I would like to see the development of good measures of location. For example, if we knew the location of the catcher’s mitt (does StatCast measure this?), then one could measure the deviation of the actual pitch location from the target. I would imagine that MLB teams are currently using a number of measures to better understand the variation in pitching “quality”. Fastball speed is only one part of the puzzle.