The baseball world is still talking about the 2017 Astros. It is pretty clear that they were stealing signs using technology. The thing that isn’t as clear is how this sign stealing affected the Astros’ performance in the 2017 season. Jason Stark and Eno Sarris just wrote an interesting article “Does electronic sign stealing work? The Astros’ numbers are eye-popping” in The Athletic — this indicates that the Astros did benefit from this sign stealing. This article motivated me to do my own exploration, checking some of the statements made in this article. Along the way, I’ll describe the data sets and tools that I’ll be using.
Strikeout Rates for Successive Seasons
Stark and Sarris talk about the “dramatic plunge” in Houston’s strikeout rate in the 2017 season. To check this, using Lahman’s database (the Lahman R package) and Fangraphs (for team stats for the 2019 season), I collected the team strikeout rates for all teams for the 2015 through 2019 seasons. I’m focusing on the changes in the strikeout rates (defined by SO / (AB + BB)) from one season to the next. For a particular comparison, say 2015 to 2016, I plot the team average strikeout rate against the increase in strikeout rates from one season to the next. For each comparison, I draw a horizontal line at zero which indicates no change in the rate between the two seasons. Teams tend to show both positive and negative changes — most of the changes in SO rates from one season to the next range between -3 to +3 percent. But the 2017 Houston team appears unusual — its strikeout rate dropped 6 percent from 2016 to 2017. It actually is more common for a team to increase its SO rate in a season by 5-6 percent. Note that we know that strikeout rates have been increasing over this period, so a majority of the points tend to fall above the blue line.
Swing Rates of Astros Regulars
To get a better understanding of the SO rate drop, let’s look deeper. A batter has to first decide whether to swing or not. Next, if he swings, he wants to make contact. Last, if he makes contact with the pitch, the batter wants to hit the ball at a hard enough speed and suitable launch and spray angles to get a good batting outcome. Let’s focus on the group of 2017 Astros regulars Jose Altuve, Carlos Correa, Josh Reddick, Marwin Gonzalez, Yuli Gurriel, Alex Bregman, George Springer, and Carlos Beltran.
Using Statcast data scraped from the baseballr package, I collected pitch-by-pitch data for this group of players. Using functions from my CalledStrike package, I constructed filled contour graphs of the swing rates for this group. Looking carefully at the below display, it appears that the likely swing region (the red region) tends to get smaller as one moves from 2016 to 2017 to 2018. For whatever reason, these batters are getting more disciplined in their hitting over this period. In 2019 the swing region seems to expand again.
These observations can be supported by summary measures. Below I show the number of pitches, number of swings, and swing rate for this group of 8 Astros for the four seasons. From 2016 to 2018, the swing rate dropped from 46.9% to 43.9% and then rose to 45.3% for the 2019 season.
By the way, the Athletic article focused on the breaking balls that were below the bottom of the strike zone. If I focus on the curveballs, changeups and sliders that fall below the zone, here are the chasing percentages for the four seasons for our group of 8 hitters. (By the way, my values appear inconsistent with those reported in the Athletic article for all Astros — they report a chase rate of 25-27 rate in 2017 which is higher than my value of 33.7. But we still see a significant drop between 2016 and 2017.)
Next, let’s focus on the swings and explore the miss rates. We show filled contours of these miss rates for this group of 8 Astros for the four seasons. The blue region corresponds to the region where the miss rate is under 20% (which means that the connect rate is higher than 80% in this region). Note that the size of the blue region is larger in 2017 than in 2016. So not only are these Astros swinging at fewer pitches, but they are more likely to make contact on the pitches they are swinging.
Again this observation is supported with summaries — we see below that the contact rate for swung pitches increased significantly from 79.6% to 82.5% from 2016 to 2017. From 2017 to 2019, this contact rate has stayed pretty constant.
Given that a batter makes contact with the ball, he generally wants to hit it hard. Below I show contours of the launch speeds on batted balls for this group of 8 Astros. Here I don’t see any obvious changes in the patterns in the launch speeds from 2016 to 2017.
Back to the Article — Some Closing Remarks
- Did the Astros’ hitting change during the 2017 season?
If you read the Athletic article, they clearly demonstrate some unusual changes in the Astros hitting in the 2017 level, both at the group and individual player level. They focus on the changes at home and on the road. If the Astros cheating was restricted to home games, then one might see differences in their home and away performances. (I do plan on checking out the home and away rates in a future analysis.)
- Did the sign-stealing cause the change in Astros hitting?
Although there are some interesting patterns, there are other possible reasons for the changes in hitting. These other reasons include the change in team rosters, changes in coaching strategies, and so on. All of the possible explanations are confounded with the sign-stealing effect and one can’t say that one particular explanation, say the sign-stealing, is causing the change. The Athletic article does stress that one cannot say that the sign stealing causes the improvement in hitting.
- There is more to be learned.
Although the findings in the Athletic article are interesting, I feel that there is more to be learned about this issue. Since I had the contour graphs from the CalledStrike package at my disposal, it was easy to do this particular analysis. I think modern batters are changing their approaches at the plate due to coaching, and I think one can usually detect these changes by means of currently available data.