Monthly Archives: November, 2021

Statcast Streakiness on Balls in Play

Introduction

Back in a post from March 2018, I stated a question from Dan Turkenkopf (from his article “Catching Up with Baseball’s Hilbert Problems”) asking if slumps and streaks were correlated with speed-off-bat. In this post, I looked at patterns of batter’s launch speed over a season and identified hitters who were usually streaky or unusually consistent in their launch speed during the 2017 season. In a second post from October 2020, I explored streaky patterns in both whiffs and estimated hit probabilities. In my closing remarks of that post, I commented that Statcast measurements are likely more ability-driven then results of balls put into play and so I think there is a better chance of identifying streakiness through these launch condition measurements.

In the “Streakiness” chapter from our book Curve Ball (coauthored by Jay Bennett), we used moving average plots to detect streakiness in hit/out data and we distinguished observed streakiness from true streakiness where the streakiness is more than one would predict based on some “consistent” probability model. Here I will illustrate this Curve Ball approach by means of a Shiny app. I’ll describe the method, show some snapshots from the app, and provide some examples of hitters who were unusually streaky or unusually consistent in the 2021 season.

Illustrating The Method Using the 2021 Bryce Harper

We begin with all 356 balls that Bryce Harper (the 2021 National League MVP) put into play in the 2021 season. To measure the quality of the ball in play, we use the Statcast estimated_ba_using_speedangle variable which is the estimated hit probability based on the launch speed and launch angle measurements. We are looking for streaky patterns in this sequence of measurements sorted in time.

A basic graph to illustrate streaky hitting patterns is to plot moving averages of these estimated BA measurements against the in-play number using some window width. Here’s a graph of the moving averages using a width of 20:

I have shaded the regions above and below the overall in-play BA average of 0.409. Streakiness is indicated by large blue regions in this plot. One possible way to measure observed streakiness is to compute the total area of the blue region that I call BLUE — in this display BLUE = 19.07.

Is the Streakiness Meaningful?

It is difficult to make sense of this observed streakiness in the in-play data without some reference. Maybe Harper is truly consistent in his batting approach and what we are observing is just some random or chance variation. (After all, flipping results from a fair coin can look pretty streaky.)

Suppose that Harper is truly consistent in the sense that all possible permutations of his 356 balls in play are equally likely to happen. We will call this the “equal probability model”. If one assumes this model is true, one can learn about plausible patterns of streakiness by a simulation experiment. Suppose we …

  1. Randomly permute the 356 values of Harper’s estimated BA values.
  2. From a moving average plot of this permuted data (with a window of 20), compute the area of the blue region.

We repeat steps 1 and 2 a total of 500 iterations and collect the values of BLUE. Below is a histogram of the simulated BLUE values assuming the equal probability model:

Under this model, we see that BLUE values between 15 and 25 are possible and Harper’s value of 19.07 is in the middle of this simulated distribution. We can measure the extremeness of Harper’s value by the Tail Probability, the probability the simulated BLUE is at least as large as Harper’s value. Here the tail probability is 0.438 which indicates that Harper’s streakiness is what one might predict from this equal probability model. In other words, there is little evidence that Harper is truly streaky. If we find for another hitter that the tail probability is small, say under 0.10, then we would have evidence that this hitter shows some true streakiness.

Some Extreme 2021 Hitters

I repeated this exercise for all batters who had at least 300 balls in play during the 2021 season. For these batters, I focused on the tail probabilities — batters with small tail probabilities are more streaky than one would predict from the chance model and batters with large tail probabilities (close to 1) are more consistent than one would predict from my model.

Here are moving average plots for two hitters that stood out (here I am using a width of 30). Jeimer Candelario had an unusually consistent pattern of hitting (small blue area) and Yoan Moncada’s pattern of hitting was unusually streaky (large blue area).

The Shiny App

It is straightforward to write the code for a Shiny app that does this work for any 2021 hitter of interest. Here’s a snapshot of the app. One inputs the batter name, the selected measure (either one can use the estimated BA or estimated wOBA based on the launch variable measurements), and the width value for the moving averages. The Observed tab shows the moving average graph and the value of BLUE. The Simulated tab displays a histogram of 500 simulated values of BLUE from the equal probability model and the value of the tail probability. One can use the Download Data button to download a csv file of all of the moving average data.

Comments

  • Try out the Shiny app. I’ve included this app in my ShinyBaseball package. Once the package is installed, then one runs this app by typing StreakyInPlay() on the R console. This app.R file contains all of the code and the dataset sc2021_ip3 in the data folder contains the relevant Statcast data.
  • Measuring streakiness. Here I use the total area of the blue region as my streakiness measurement, but the approach can be used for other streaky measures. For example, one could look at the difference between the largest and smallest moving average values as the streaky measure.
  • Making sense of streaky patterns. Fans and sportswriters get excited about streaky observations, but it difficult to understand the significance of these findings since “random” data also display similar streaky patterns. Here we are using streaky measures from a particular probability model as our reference. If the observed streakiness is extreme relative to the streaky patterns from random data, then our streaky pattern has some meaning.
  • Is true streakiness predictive? Fans and managers believe that particular players have streaky hitting ability. Is that is true, then observed streakiness from one season would be predictive of streakiness in future seasons. I haven’t done a complete exploration using Statcast data, but my current belief is that few players have streaky ability. That is, streaky performances are not predictive.