In honor of the Cleveland Indians’ recent 22-game winning streak, it seemed appropriate to do something about streakiness of baseball teams. Although the Indians had a remarkably long streak of wins, I don’t think of them as a particular streaky team. I’ll explain here …
- what I mean by a streaky team and a consistent team
- how one can measure team streakiness
- how one can use a permutation test to see how a team’s streakiness differs from “random” behavior
Streaky and Consistent Teams
Here are the outcomes of the Indians’ first 150 games in the 2017 season:
W W W L L L W L L L W L W W W W
W L L W W L W W L L W L W W L W
L L L W W L L W W W L W L L L W
W W L W L L W L L W L W L L W W
W W W W L W W L L L W L W W L W
W L L W W W L L L L W L L W W W
W W W W W W L L L W W L L W . L L
W W W W W W L W W L W L L W W W
W W W W W W W W W W W W W W W W
W W W L W W
Were the Indians streaky this season? Although they were very successful, I don’t think they were pretty streaky. To me, “streaky” means that a team will go through periods of hot spells AND also periods of cold spells during a season. If a streaky team has won a game, then it is more likely to continue winning in the next game; conversely, if a streaky team has lost, then is more likely to lose again.
The opposite pattern is “consistent” — this means that a team will tend to avoid streaky patterns, both hot and cold. The consistent team will have few long winning streaks and few losing streaks. If the team wins, it seems more likely to lose — similarly, after a loss, this consistent team is more likely to win.
Random coin flipping is neither streaky or consistent. In coin flipping , the chance of heads stays the same throughout the sequence and the chance of heads on one flip does not depend on the outcome of the previous flip. We can assess if a team’s pattern of winning or losing is streaky or consistent by comparing its W/L pattern with the patterns of sequences of coin flipping.
Basically a measure of streakiness is a measure of the degree of “clumpiness” in a sequence of 0’s (losses) and 1’s (wins). There are different ways of measure clumpiness, but one reasonable measure is the sum of squares of the gaps (number of losses) between consecutive wins. For example, here are the number of losses between consecutive wins for the first 150 games of the Tribe. (We see a lot of zeros in this sequence since the Tribe had many consecutive runs of W’s this season.)
0 0 0 3 3 1 0 0 0 0 2 0 1 0 2 1 0 1 3 0 2 0 0 1 3 0 0 1 2 2 1 2 0 0 0 0 0 1 0 3 1 0 1 0 2 0 0 4 2 0 0 0 0 0 0 0 0 3 0 2 2 0 0 0 0 0 1 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Our measure is clumpiness in the W/L sequence is the sum of squares of these gaps which is CLUMP = 127.
A Permutation Test
To assess the streakiness or consistency of a team’s sequence of wins and losses, we use a permutation test. Suppose that the order of the Indians observed wins and losses is indeed random in the sense that any possible arrangement of the Indians 93 wins and 57 losses in the 150 games is equally likely. This statement motives the following simulation experiment:
- Randomly mix up the order of the 93 Wins and 57 Losses.
- Compute the gaps (number of losses) between successive wins and compute the clumpiness statistic CLUMP.
- Repeat steps 1 and 2 many (say, 1000) times — we get the distribution of CLUMP if wins and losses were distributed randomly throughout the season.
- We see how the team’s value of CLUMP compares with this distribution. If the team’s CLUMP is in the left-tail of this “random” distribution, this indicates the team is unusually consistent. Conversely, if the team’s CLUMP value is in the right-tail of this distribution, the team has a streaky performance.
- We can measure how extreme the value of CLUMP by computing a p-value, the chance that a random CLUMP is at least as large as the team’s value. Large p-values (close to 1) indicate a team is consistent and small p-values (close to 0) indicate a team is streaky.
To illustrate how this works, I randomly mixed up the Tribes 93 wins and 57 loses many times, each time computing the value of the CLUMP statistic. Here is a histogram of the CLUMP values over the 1000 simulations. We see that these values range from 80 to 220 with 120 being a typical value.
Were the 2017 Indians streaky or consistent? Remember the value of CLUMP for the 2017 Indians data was 127 and I indicate this value with a red vertical line.
The p-value, the chance that a random sequence is at least as large as 127 is .395. Since this value is not too small (near 0) or too large (near 1), there is really little evidence that the 2017 Tribe’s pattern of sequence of wins and losses is unusually consistent or streaky. The Tribe has a lot of wins, but the pattern of wins resembles the pattern of a random sequence of 93 wins and 57 losses.
The Dodgers have a similar W/L record as the 2017 Indians, but their pattern of wins and losses is very different. Implementing this same test for the Dodgers first 149 games gives a p-value of 0.001 — they were very streaky. Looking at a moving average graph of the Dodgers’ wins (window of 20 games), shows that the Dodgers had a great winning run until a big losing streak — this pattern is much more clumpier than one would see from a random pattern of Wins and Losses.
For Part 2 of this Study
The goal of this post is to explain the general approach for studying streakiness or consistency of sequences of wins and losses. In Part 2 of this study (probably posted next week), I’ll use Retrosheet data to explore patterns of streakiness and consistency for teams in the last 50 seasons. In particular, here are some questions to explore:
- Which teams were unusually streaky or unusually consistent in the past 50 seasons?
- If a team is consistent one season, will it be more likely to be consistent the following season? (Likewise, does streakiness or consistency persistent across seasons?)
- Are winning or losing teams more likely to be streaky or consistent?
Also, in part 2, I ‘ll describe using the Retrosheet game log data and R functions to implement this study.
absolutely loving this Jim, i have done some work on anomaly detection and outbreak detection of terrorist incidents on the GTD and this has inspired me to go back and revisit the GTD to look for clumpiness in terrorist incidents by particular groups, bravo on the post.
Peter, I think these methods have general applicability. One application is understanding the patterns of a patient taking medicine.