As you probably know, Mike Trout was chosen to be the American League MVP in 2016. Trout has had a remarkable five full seasons in the big leagues — for two seasons he was chosen MVP and the other three seasons he finished runner-up in the MVP voting. (I wish Trout played for a east-coast team, so I would have more opportunities to see him play. Mike Trout was a Phillies fan growing up.)
Obvious, Trout has had great season totals in WAR, home runs, etc. But we’ll explore Trout’s hitting from a different perspective — how streaky has Trout been in his pattern of hitting in the 2012 through 2016 seasons?
I’ll give you some background on my work on streakiness. (Here’s a link to a Chance article on streakiness in home run hitting.)
- Suppose you look at player’s sequence of official at-bats and record each AB as hit (1) or an out (0).
- Record the values of the spacings, the number of outs between each pair of consecutive hits. Spacing values close to 0, or large spacings values indicate some streakiness in the 0/1 sequence.
- Of course, we’ll observe some unusual spacings values, but the relevant question is: are these values different from what one would observe if Trout was truly consistent? (Truly consistent means that Trout’s probability of success remains constant throughout the season.)
- I have proposed various measures of streakiness, but one simple measure is the sum of squares of the spacing values — if this is large this indicates some streakiness.
- One can test a hypothesis of consistency by a permutation test — here is how you do this.
- Randomly permute the sequence of 0’s and 1’s.
- Compute the sum of squared spacings, call this S.
- Repeat 1 and 2 many times — one gets a distribution of the value of S if we had a consistent model.
- Find the p-value = the proportion of values of S that are least as large as the observed value of S from Trout’s data.
For each of Trout’s seasons (2012 through 2016), I conducted this test of consistency of his hit/out data — a small p-value indicates some evidence for “significant” streakiness.
But this streaky measure can depend on what you consider a success. So I repeated this for
(a) home run data — either Trout gets a HR (1) or not a HR (0)
(b) strikeout data — either Trout strikes out (1) or doesn’t strike out (0)
So I am doing this test of streakiness a total of 15 times — for each season for each type of data (hit/out, HR or not, SO or not). Here is a graph summarizing the p-values I found (I have colored the points blue for interesting values where the p-value is small which indicates some streakiness).
What do we see?
- For Hit/Out data, Trout has been remarkably consistent in his pattern of hitting. All of the p-values are in the .5-1 values. (A value close to 1 indicates an unusual pattern more consistent than one would anticipate if one were flipping a coin with a constant chance of success.)
- For HR data, Trout has been remarkably consistent — the spacings between home runs have been consistent, especially for 2012, 2014, 2016. There is some evidence for streakiness in 2015. In further observation, I see that Trout had a “0 for 95” homeless streak in 2015.
- For SO data, Trout has exhibited some streakiness in 2012 and 2016. Looking further at his 2016 data, I see a number of large spacings (11, 14, 10, 14, 14, 15, 17, 16). This is actually good — this means that he had a number of stretches where he did not have a strikeout. I haven’t looked at this carefully, but this seems remarkable for a power hitter.
In my research, I have explored streakiness in baseball history, and I have explored remarkable consistency in historical batting performance. (For example, Henry Aaron had a very consistent pattern of hitting home runs in his career.) I think a player’s hitting ability can be explained partly by his pattern of streaks and slumps. I don’t know a lot about Trout’s approach to hitting, but his manager should be happy that he does not appear to exhibit streaks of poor performance.
Added note (11-22-16):
I have added some R code on my gist site which (1) reads in the 2015 Retrosheet play-by-play data, (2) extracts Trouts sequence of successes and failures using H, HR, or SO as the definition of success, and (3) implements functions from the BayesTestStreak package to find the spacings and do a permutation test of randomness.
Do you happen to have the Rscript for this particular exercise or will most of this be discussed in chapter 10? I’m fairly new and I’m only on career trajectories, but I’d love to explore this for what has seemed like 2 years of streaks for some of my beloved metsies.
Sam, similar code is described in Chapter 10. There is a R package BayesTestStreak that does the calculation of the spacings and the p_value for the permutation test. I’ll add a link to the use of this for Trump’s data.
Thanks. Love your work!