#### Introduction

Recently I got interested in spin rate, one of the newer Statcast measurements of a pitcher. A recent article by DriveLine Baseball provides some information about spin rates, including a discussion of the underlying physics issues and an explanation why spin rate is relevant for study. This article motivated me to do an exploration of spin rates. I want to get a handle on the variation in spin rates, see which pitchers have high and low rates, and explore the importance of this variable (together with pitch velocity) in the pitcher-batter matchup.

#### Spin Rates of Fast Balls

To begin, I decided to focus on a particular pitch, the four-seam fastball, coded as FF in the Statcast dataset. Here is a histogram of the spin rate (variable release_spin_rate) of the 26,000 four-seamers thrown in the 2018 season. We see a large variation in the spin rates — most of them fall between 2000 and 2500 rpm (revolutions per minute).

#### Comparing 12 Pitchers

To check for variation in spin rates between pitchers, I decided to look at the 2018 pitchers who threw more than 1500 four-seamers — there were 12 pitchers in this group. Below I construct parallel boxplots of the spin rates. A couple of things to notice: there is variation in the spin rates within pitchers, but there is also sizeable variation between pitchers. Justin Verlander has the greatest median spin rate and Reynoldo Lopez has the smallest median spin rate in this group

Below I construct a scatterplot of the median spin rate and the median velocity for our group of 12 pitchers. In this group, Luis Severino has the fastest four-seamer, on average. Justin Verlander has an “average” 94 mph fastball but he has the highest average spin in this group.

#### Modeling a Miss from Spin Rate and Velocity

The DriveLine Baseball article indicates that a desirable fastball (from the pitcher’s perspective) should be fast but also should have some spin. This motivates fitting a regression model for predicting pitcher success using velocity and spin. Consider all of the batter swings during the 2018 season and let the response variable be Miss (1) or Not-Miss (0). I fit the generalized additive model.

logit Prob(Miss) = s(Spin_Rate, Velocity)

where logit is the function log(x) = log(x) – log(1 – x) and s() is some smooth function of the two variables. To show this fit, I construct a scatterplot of 1000 fastballs and color the point by the fitted probability of miss. This shows that the fitted miss probability is high in the upper-right section of the plot corresponding to high spin rates and high velocities. This means that a pitcher can compensate for an average fastball speed with a high spin rate. It seems that speed and spin both play important roles in getting the batter to miss on a swing.

Let’s revisit our 12 pitchers. On the same scatterplot I have added the actual miss fraction on swings. Note that the highest miss fractions correspond to Verlander and Scherzer who both have high spin rates respectively of 25% and 27%. In contrast, some of the low miss fractions of 14% belong to Lopez and Gausman who have low spin rates. Serverino, despite having the fastest fastball, only induces a 17% miss rate

#### Do Speed and Spin Explain Miss Rates?

Hopefully, I have convinced you that spin rates are relevant in the sense that batters are more likely to miss fastballs with good spin. But I think that velocity and spin are an incomplete description of a pitcher’s success. Here’s what I did to reach this conclusion:

- For each pitcher, I used the GAM model to predict the probability of a miss for each pitch using the velocity and spin rate measurements.
- By summing these predictions over all fastballs, I get an expected number of misses predicted from the fitted model — call this number E.
- We also have the actual number of misses for that pitcher — call this number O.
- To see how close the observed value is to the expected number, I compute the Pearson residual R = (O – E) / sqrt(E).

Below I display the residuals for our 12 pitchers. If the model is reasonable, then I’d anticipate most of the residuals to fall between -2 and 2. Instead we see some large residuals — in fact sevem of the residuals are larger than 2 in absolute value. For example, Max Sherzer’s residual is close to 4. This means that Max’s success in getting batters to swing and miss to due to other variables besides velocity and spin. Maybe it is his pitching motion, maybe it is his ability to throw pitches in the outside of the zone. I don’t know the reason, but it is something else besides speed and spin. Also there are some large negative residuals. Luis Severino creates fewer misses from his fastballs that one would predict on the basis of his velocity and spin rate. Maybe Severino tends to pitch to the middle of the zone — I don’t know — but it would be interesting to explore why he has such a large residual.

#### Further Exploration

I just started my exploration of spin rates, focusing on four-seam fastballs and a small group of pitchers. For the interested reader, here are some ideas for further exploration:

- Look at other pitch types, especially those of the off-speed variety. Identify the pitchers with the greatest spin rates for each pitch type.
- Here we saw that the miss rate on a swing depends on the spin rate. Can you find other advantages of high spin on a pitch?
- Is it possible to be an outstanding pitching without a high spin rate?

I also wonder if there’s a significant effect on the miss rate from the repertoire of effective pitches of the pitcher, the circumstances of the at bat, and sequencing of those pitches in the at bat in addition to the speed, spin and location.

really interesting stuff, I would like to expand on some of the stuff you did here, do you have a link to this git repo?

Mark, sorry I don’t believe I put this particular code on my git repo.