After reading the two recent posts (part I and part II) on streaky and consistent teams, John comments:

*“I was reading over this and wouldn’t the CLUMP statistic just be looking at the streaky-ness of the team’s losses? If so then it makes a lot of sense that the Dodgers this year were one of the worst due to their above average record coupled with a very long losing streak. I followed your model’s example, and added on to it. LCLUMP is what you were previously calculating while WCLUMP is looking at the win streaks. I found with 1000 iterations, a team with 93 wins and 57 losses will on average have LCLUMP with mean = 123.8, sd = 17.8 and WCLUMP with mean = 379.9, sd = 57.9 . The 2017 Indians through 150 games have a LCLUMP of 127 and a WCLUMP of 751. This shows that the Indians were in fact abnormally streaky. For a team to have their WCLUMP average close to that number, they would have to be 113-37 through 150 games (WCLUMP about 760). I really enjoyed doing this (all in R), so if you have any thoughts or further input, I look forward to hearing it!”*

Let’s review what I talked about earlier and then I can add some thoughts to address John’s comments.

### The 2016 Indians — Losing Streaks

Let’s focus on the streaky performance of the 2016 Indians (it is too painful to talk about the 2017 Indians). We observe this sequence of wins (1) and losses (0):

[1] 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1

[26] 1 0 1 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 0 0 0

[51] 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1

[76] 1 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 0 1

[101] 1 1 0 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 1 1 0 0

[126] 0 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1

[151] 1 1 1 0 0 1 0 0 1 1 1

We collect the lengths of losing streaks, that is the number of 0’s between consecutive 1’s:

[1] 1 0 2 0 1 1 2 0 0 2 3 0 0 0 1 1 1 2 0 0 0 0 2 1 0 1

[27] 3 0 0 0 0 0 2 0 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2

[53] 2 1 1 0 3 1 0 0 3 1 1 1 0 0 0 1 1 0 1 0 3 2 0 0 0 0

[79] 0 2 0 0 1 2 1 0 1 0 0 0 2 2 0 0

Our summary measure of clumpiness is the sum of squares of these lengths. We can assess how this compares with “randomness” by means of a permutation test. The p-value (computed by simulation) is equal to 0.937 — this value is large (close to 1), so **we conclude that the Tribe is consistent** — they tended **not to have long losing streaks**.

### The 2016 Indians — Winning Streaks

But as John mentions, streakiness might be viewed in terms of lengths of winning streaks instead of lengths of losing streaks. Okay, we look at the same win-loss data, but now look at the lengths of the 1’s between consecutive 0’s:

[1] 0 2 0 2 1 1 0 3 0 1 0 0 4 1 1 1 0

[18] 5 0 1 2 1 0 0 6 0 2 1 0 0 14 0 2 0

[35] 1 0 1 1 2 0 0 1 3 0 0 1 1 1 4 1 2

[52] 2 0 0 1 0 6 0 3 1 0 1 2 4 0 1 0 3

Although the Tribe did not long losing streaks, they had 14-game and 6-game winning streaks. If we implement the same permutation test, we get a p-value of 0.059. **From a winning streak perspective, the 2016 Tribe was streaky.**

### Relationship Between Winning and Losing Streaks

Okay, one specific team was consistent in having short losing streaks, but streaky in having long winning streaks. That raises the question: **Are teams that are streaky with respect to winning also streaky with respect to losing? Likewise, are consistent teams with respect to winning streaks also consistent with respect to losing streaks?**

Actually, my initial opinion was “yes” — I thought a streaky team would exhibit long runs of wins **and** also long runs of losses. Also, a team that avoided long runs of losses also would not show long runs of wins.

To check this, I computed the permutation test p-value using the lengths of losing streaks and the p-value using the length of losing streaks for all 30 teams in the 2016 season. Here’s a scatterplot where I label the point by the team abbreviation.

What do we see in this graph?

- Generally there is a positive association in this graph indicating that streaky winning streak teams tend also to be streaky losing streak teams. Likewise, consistent teams with respect to winning streaks tend also to be consistent with respect to losing streaks. This statement agrees with my prior opinion.
- For example, teams SDN, SLN, MIA, MIL, TOR, and LAN tend to be consistent with respect to losing and winning. Teams ATL, KCA, DET, PIT, ANA, MIN, and TBA tend to be streaky both ways.
- But there are notable exceptions such as CLE and WAS who look streaky or consistent depending on what you are looking at (either winning streaks or losing streaks).

### So What?

Since a team’s streakiness (or consistency) seems to depend on how you look at it (winning streaks or losing streaks), it would seem desirable to produce a measure that is not dependent on how one looks at it.

Here is a simple proposal. Look first at the lengths of all losing streaks — take the sum of squares — call it LCLUMP (following John’s suggestion). Also find the lengths of all winning streaks and let WCLUMP be the sum of squares of these lengths. Then define the clumpiness measure to be the sum CLUMP = LCLUMP + WCLUMP. We can use a permutation test to determine streakiness or consistency using the CLUMP statistic.

Here I illustrate p-values for the 2016 teams using a permutation test and the CLUMP statistic. Basically, the p-values shown are “averages” of the p-values using the losing streaks and the winning streaks. I use color to identify teams that had unusually consistent or streaky seasons.

### Final Comments

Several takeaways from this work:

- If one says a team is streaky, this could mean long runs of wins or long runs of losses. The use of a statistic such as CLUMP takes account of both types of long runs and it seems preferable to other measures like LCLUMP or WCLUMP.
- Although we measure clumpiness by looking at the sum of squares of the spacings, there are alternative ways of measuring clumpiness (see the paper by Zhang, Bradlow, and Small).