An Invitation to Do a Streaky Analysis
Last week I took a blogging break due to our school’s spring break. I was recently rereading the article “Catching Up with Baseball’s Hilbert Problems” by Dan Turkenkopf from The Baseball Prospectus. Extra Innings: More Baseball Between the Numbers. In the last part of the article, Dan mentions seven questions in the “The Road Ahead” section worthy of further study including this one (number 6) that I’ve quoted:
“6. Do slumps and hot streaks correlate with speed-off-bat? We know that batters often string together stretches of especially good or poor performance. To date, these are largely considered non-predictive and the result of random fluctuations (especially the hot streaks). Is how the batter strikes the ball more indicative of being locked in than the outcome of the at-bat? Or is “locked in” just a way to explain a string of successes, without any predictive value either?”
Since I have Statcast hitting data from the 2017 season readily available, it seemed reasonable to use Dan’s query as a springboard to explore this data for streaks and slumps.
I’m interested in looking for streaky patterns in a batter’s launch speed over the season. I start by breaking the 2017 season into 13 two-week periods — the new variable is called biweek. In passing, I was curious how the average launch speed and average launch angle had changed over the season — see the graphs below. What we see is that the average launch speed started high, decreased through the season and gradually increased towards the end of the season. In contrast, the average launch angle started low but increased during the season. (Any plausible explanation for these patterns?)
I focused on all players in the 2017 season who put at least 400 balls in play. For each player I fit the linear model
launch_speed ~ biweek
where biweek is a factor input. One output of this regression is a R^2 value which measures the fraction of total variation in a batter’s launch speeds for balls in play explained by the biweek variable. If a player is a streaky hitter, then I’d think that he would show some variation in launch speed over the season and that variation would be picked up by the R^2 variable. (I’m not really interested in testing here, so I’m not looking at the p value, but that could also be used as our measure.)
Below I display the R^2 values for all the 2017 hitters. Here we are not interested in the actually sizes of the R^2 values, but rather using this measure to set apart the streaky hitters.
Streaky and Consistent Hitters
The streaky players should be the ones where a significant amount of the variation in launch speed values is explained by the biweek number — so the streaky players are the ones with the largest R^2 values. In our graph, four players stand out with R^2 values greater than 0.06. We display boxplots of launch speed by biweek number for these four players. We see some interesting streakiness. For example Corey Seager struggled in launch speed early in the season, did better in a four week period in the middle, and was inconsistent towards the end of the season. Justin Turner’s streakiness is less obvious from the graph, but he struggled in biweek 10. Jake Lamb had strong launch speed values in biweeks 7, 10 and 16, but the values were much smaller for other biweeks.
Our R^2 graph also can be used to identify consistent hitters with respect to launch speed over the season. We look at the six players where the R^2 value is smaller than 0.015 — the boxplots of their launch speeds are displayed following. It is interesting how consistent Mookie Betts was in biweeks 7 through 15 and he did a bit better in the end of the season.
Relationship Between Launch Speed and Batting Average
Generally one believes that higher launch speeds correspond with higher batter averages. Let’s revisit our four streaky hitters and look for an association between their average launch speed and average in-play AVG (each observation is a particular biweek). As expected, we see a positive association, but there are some interesting exceptions from the general pattern (for example, Justin Turner’s biweek when he displayed his smallest average launch speed but highest in-play AVG). Perhaps this variation is due to the small sample sizes for the individual biweeks.
- I think Dan Turkenkopf query about streakiness in speed-off-bat data is interesting and deserves further exploration. Or at least more exploration than I have shown in this post.
- Since a batting average is very “luck-driven”, I think streaky patterns in batting average are similar to the patterns one observes from coin tossing, and they are not predictive of future streaky performance. In contrast, I think measurements off the bat are more ability-driven and they may be better measurements for use in streakiness studies.
- I’ve only focus on the launch velocities here, but certainly launch angles are also relevant. Maybe there would be a better measurement of performance for a streakiness study that combines launch angle and launch speed.
- A R script for reproducing all of this work can be found here. The Statcast data was conveniently downloaded using the
I used the
broompackage for performing all of the regressions and storing the R^2 values in a data frame.