Streaky .400 AVG Performances


Over the years I have been interested in streaky performances of baseball hitters. Recently, I saw Bill James post an observation about one remarkable hitting accomplishment on Twitter:

“In a stretch of 25 games in 2004, July 25 to August 21, Ichiro Suzuki had 57 hits good for a .514 average. 57 hits in 25 games.”

That started a train of thought about remarkable hitting records. A .400 batting average was at one time a standard of excellence in hitting. The last season we saw a .400 average was Ted Williams’ .406 AVG in 1941 (185 hits in 456 at-bats) and it is very unlikely that we will ever see a .400 final season AVG again in MLB. On this subject, Dan Agonistes discusses Stephen Jay Gould’s rationale for decreasing batting averages.

But that raises a related question: what are the most notable .400+ batting averages stretches within a particular season? For example, Ichiro Suzuki had a season .372 AVG in the 2004 season. Although Suzuki didn’t bat .400 for the season, there was likely a long stretch of consecutive AB during the 2004 season when he did bat over .400. Scanning over his AB, what was the largest number of consecutive AB when he did have a .400+ average? And how does that compare with other hitters in the past 20 seasons? So here “most notable” means a long stretch of at-bats with at least a .400 average.

A Simple Example

To understand what I am thinking about, here is a sample of the outcomes of 30 at-bats of Ichiro Suzuki (specifically at-bats numbered 200 through 229) for the 2004 season where 1 denotes a Hit and 0 denotes an Out.

1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0

For this 30 AB period, you can check that Suzuki’s AVG was 11 out of 30 = 0.367. But if one chooses shorter stretches of AB, then you’ll find periods among these 30 at-bats where Suzuki did indeed bat over 0.400. For example, in the first four AB, Suzuki was 2 for 4 for an average of 2 / 4 = .500 but that isn’t that impressive since it is based on only 4 AB. What was the longest period where he bats at least .400? One can check that in this 30 AB example there is a stretch of 20 AB where he did hit at least 0.400. For this simple example, the longest period of at-bats of .400 hitting was 20. We’d like to do this computation for all hitter seasons between the 2000 and 2019 seasons. Specifically, what are the longest stretches of .400 hitting in the past 20 years?

Using R and Retrosheet

Here is an outline of how I work on this problem using R. For a particular season, the relevant data are the Retrosheet play-by-play files where each row corresponds to a single plate appearance.

  • I collect the number of at-bats for all hitters in that season. I decide to collect the ids for only the hitters who have at least 300 at-bats since they are the hitters that have the opportunities of having long stretches of .400 hitting.
  • I have to make sure that the rows are ordered by game number and plate appearance, so I am actually working with the AB results as they occur during the season.
  • For a hitter and particular window of at-bats, say 120, I compute all of the rolling (or moving) batting averages and find the maximum AVG among all of these periods.
  • I repeat this process for sequences of length 5 through the total number of AB. I find the largest sequence length where the maximum AVG exceeds .400.
  • I repeat this process for all hitters — I record the players with the five highest sequence lengths of .400 hitting.

Using my function, here are the top five streaky .400 leaders for the 2019 season. Charlie Blackmon and Ketel Marte each had a stretch where they were each 68 for 140 = .400. Actually 140 doesn’t sound like a long stretch of .400 hitting and we’ll see that these are short stretches of hot hitting in recent baseball history.

Best .400 Stretches in the Past 20 Seasons

I repeated this process for each of the 20 seasons 2000 through 2019. I looked at all batters who had at least 300 AB in a season, computed all of the rolling AVG of different lengths, and found the five hitters who had the longest stretches of .400 hitting for each season. I plot the results below — I show a scatterplot of the AB length against the season. I add a smoothing curve to show the general pattern and label a few points corresponding to unusually large stretches.

Here are some observations from this graph:

  • Generally, the top hitters are displaying shorter sequences of .400 hitting as one moves from 2000 to 2019. It was more common to have a 200+ AB sequence in the 2000 season. We have not seen a 200+ sequence of .400 hitting in the last three seasons, and I would doubt that we would see one in future 162-game seasons.
  • There are some special hitters that have shown long stretches of .400 hitting. The top two are Ichiro Suzuki who had a 520 AB stretch of .400 hitting in that remarkable 2004 season and Todd Helton who had a 445 AB stretch in 2000. The next longest stretch was Josh Hamilton’s 357 AB stretch in 2010. In recent seasons, Jose Altuve had two long .400 stretches — a 320 AB stretch in 2017 and a 300 AB stretch in 2016. Joey Votto also had a 292 AB stretch of .400 hitting in 2016.
  • By the way, Ted Williams only had 456 AB in the 1941 season to get his .406 AVG, so perhaps Suzuki’s batting accomplishment was more impressive since he hit .400 in a longer stretch of AB (520) in a single season.

Final Comments

  • In the upcoming 2020 season, we are going to see some large batting averages, but these will be due to the high variability of AVGs in small sample sizes. I like this investigation since it focuses on the number of AB instead of the AVG. A .400 AVG in 400 AB is much more impressive than a .400 AVG in 200 AB.
  • Why do current hitters have short .400 stretches? The rate of strikeouts in baseball is currently at an all-time high and it is difficult to hit for high average. Honestly, it seems that teams are more interested in home run hitters than batters that hit for AVG. Hitters like Ichiro Suzuki are rare in modern baseball.
  • I wrote a single function best_streaks() that does all of this work. The input is the Retrosheet dataset for a particular season and the output are the five hitters with the longest .400 seasons for that season. This function with some examples are posted on my Github Gist site.
  • If you like reading about streaky performances, here are some “streaky” posts on this site where I talk about other types of “hot performances” such as streakiness in home run hitting, in team performances, and in Statcast off-the-bat measurements.
    Streaky Home Run Hitting
    Streaky and Consistent Teams
    Statcast Streakiness