Mookie’s Streak
There was a noteworthy recent baseball streak event — Mookie Betts went 129 plate appearances without striking out. This motivated me to add some new functionality to my BayesTestStreak
package. I’ll illustrate some of the functions here, focusing on no-strikeout streaks for all regular players in the 2016 season.
Installing the Package
You can install the current version of BayesTestStreak
from Github:
library(devtools) install_github("bayesball/BayesTestStreak")
Obtaining Some Streak Data
Some variables from the 2016 Retrosheet play-by-play dataset are included in the package as the data frame pbp2016
. You can use the streak_data
function to get the streak data (vector of successes and failures) for a specific player. You specify how to define a success (either “H”, “HR”, “OB”, or “SO”) and whether you want to have all plate appearances or just official at-bats. Here I first use the find_id
function to find the Retrosheet player code for Mookie.
library(BayesTestStreak) find_id("Mookie Betts") [1] "bettm001" y <- streak_data("bettm001", pbp2016, "SO", AB=FALSE)
The variable y
is just a vector of 0’s and 1’s corresponding to no-strikeouts and strikeouts. To check that I have the right data, I confirm that Mookie had 80 SO (and 650 non-strikeouts) for the 2016 season.
table(y) y 0 1 650 80
Some Graphs of This Data
This package provides several ways of graphing this data. The plot_streak_data
function provides a simple line graph where the PA locations of strikeouts are indicated by vertical lines. We see a long white space at the end of the 2016 season corresponding to a run of non-strikeouts.
plot_streak_data(y)
To see short-term patterns of strikeout rates, one can use the mavg_plot
that provides a moving average plot. The inputs are the streak data (vector of 0’s and 1’s) and the window length (here I use 50 PA). The areas represent the streaky patterns of hitting away from Mookie’s overall 2016 strikeout rate. Here we see that Mookie actually had a rash of strikeouts early in the season, but had a strong non-strikeout streak at the end of the 2016 season.
mavg_plot(y, 50)
Looking at Streaks
The media is interested in the lengths of the runs of non-strikeouts — I call these spacings. One can compute all of the spacings by use of the find.spacings
function. We see that the gaps between consecutive strikeouts is 0, 4, 3, 0, 5, etc. We see the long gap of 78 at the conclusion of the 2016 season. The I
variable indicates with a 0 that the last spacings value of 78 did not end — in fact we know Mookie continued with 129 – 78 = 51 non-strikeouts at the beginning of the 2017 season.
find.spacings(y) $y [1] 0 4 3 0 5 0 2 0 5 1 0 12 3 2 4 1 7 2 21 9 [21] 0 11 10 3 8 16 1 14 13 2 1 3 12 18 2 22 5 10 8 0 [41] 10 15 0 2 5 2 6 14 1 3 2 33 7 7 1 14 4 2 27 16 [61] 1 1 6 2 5 16 13 29 5 9 4 1 14 18 3 0 3 1 27 3 [81] 78 $I [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [31] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [61] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
Looking at Longest Streak of Non-Strikeouts for All Players
Suppose I’m interested in looking at the longest run of non-strikeouts for all players in the 2016 season with at least 300 PA’s. Here is what I do in the R script below.
- I find the player codes for the players in the 2016 season with at least 300 PAs.
- Using the
map
function from thepurrr
package, I find the streak data for each player — this list of vectors is stored in the variableout
. - I write a function
longest_ofer
that computes the longest streak for a given player. Then using themap_dbl
function, I apply this function to all the vectors inout
. - Similarly, by a second application of
map_dbl
I find the strikeout rate for all players.
summarize(group_by(pbp2016, BAT_ID), N = sum(BAT_EVENT_FL)) %>% filter(N >= 300) %>% select(BAT_ID) -> S300 out <- map(S300$BAT_ID, streak_data, pbp2016, "SO", AB=FALSE) longest_ofer <- function(y){ max(find.spacings(y)$y) } L_ofer <- map_dbl(out, longest_ofer) Rate <- map_dbl(out, mean)
Last I construct a scatterplot of the strikeout rate and longest non-strikeout streak for all players. (I am not showing the code here.) I label the players with longest streaks exceeding 50. We see a strong association between a player’s strikeout rate and his longest “ofer” in his SO/not-SO sequence.
What Have We Learned?
So we see there were six players with non-strikeout streaks exceeding 50 for the 2016 season. What does it mean to have a long non-streakout streak? It can mean several things. Obviously, all of these players are tough to strike out and they have a talent to make contact with the ball. But are these players particularly streaky in their pattern of strikeouts? That is, are their patterns of 0’s and 1’s unusual given their general strikeout ability and number of PA’s? We know Betts had the longest non-strikeout streak, but we don’t know if he was the most streaky among the 2016 players with respect to their strikeout hitting.
Actually there is statistical evidence to suggest that the most remarkable streak among these six players was Adam Eaton, not Mookie Betts. Betts had the longest non-strikeout streak, but Eaton’s pattern of strikeouts was most unusual given his overall strikeout rate and number of PA’s. The statistical evidence is based on a permutation test that can be implemented using the permutation.test
in the BayesTestStreak
package. I applied a permutation test in an earlier post of assessing situational hitting.