Constructing Heat Maps for AVG

Williams and Underwood’s famous book The Science of Hitting (originally published in 1970) contains the following remarkable graph that shows Ted Williams’ batting average in different areas of the strike zone.  (I wonder how this data was collected over 45 years ago?)

williams

In this blog post, I’ll provide a general overview of how to construct a similar graph using the ggplot2 package with pitchFX data, and use that as a springboard to present alternative graphical views of areas of “hot” and “cold” hitting.

The Data

Using the pitchRx package, I downloaded pitch data for five months of the 2016 season.  I create a data frame with the variables gameday_link, num, Batter, X, Z, and Event.  For my purposes, all I need is the name of the batter, the pitch location (X and Z) and the Event.  Actually, the data frame only contains information on the last pitch in each plate appearance, since I’m interested in the relationship between the location of the final pitch and the PA outcome.

Mike Trout

Here are the steps to produce a “Williams/Underwood” style of graph for Mike Trout’s batting average in the 2016 season

  1. Restrict the events to only official at-bats (remove walks, etc) and define a Hit variable which is 1 for a hit and 0 for an out.
  2. Since we have a limited amount of AB for Trout for the 2016 season, some smoothing of the probabilities is desired.  I fit a generalized additive model with a logistic link (using the gam function)  to the 1/0 data using the (X, Z) location as covariates.
  3. I define a grid of X, Z values similar to what was used in Williams and Underwood’s display.  For each value, I estimate the hit probability and use ggplot2 to display the fitted averages on top of the points.

williams2.png

A Contour Plot

Although the above graph is nice, it is hard to detect the sweet spot (actually Williams and Underwood used different colors to help the reader find the hot zone).  But if one uses a finer grid of points, say 50 by 50, we can create alternative displays that better communicate hot and cold zones.

One possibility is a contour plot.  Here I construct a contour plot (using the geom_contour function) specifying contour lines at .2, .3, and .4.  For a pretty large area in the middle of the strike zone, Trout is a .400+ hitter.

williams77.png

Categorizing AVG

If we are really interested in ranges of batting averages, then a reasonable thing to do is to create a new variable that categorizes the AVG into the intervals (0, 100), (100, 200), etc, and then plot the categorized values using different colors.

williams4.png

Celebrating the Cubs with Heat Maps

One of my colleagues mentioned heat maps and R in the same sentence (he prefers MATLAB so it was notable that he mentioned R).  It is straightforward to create heat maps of AVG over the strike zone using the geom_tile function.  In honor of the Cubs clinching a playoff spot, I show heat maps for two right handed hitters, Kris Bryant and Addison Russell, and two left-handed hitters Anthony Rizzo and Chris Coghlan.  Remember that these graphs are from the catcher’s perspective.  It is interesting that all of the hitters seem to like balls in the middle and outside of the strike zone (for the righties, outside is on the right, and for lefties, outside is on the left).

williams5.png

Things to Try

Hopefully this post has gotten folks interested in trying these displays.  Here are some directions for future work.

  1.  Instead of hit/out, one could define a success as hitting a home run, and see the sweet spot for hitting a home run for different batters.
  2. Or one could measure the value of a PA by the run value and look at contours of run value by the location of the last pitch
  3. Since Statcast provides the exit velocity for each ball, it would be interesting to look at contours of exit velocity of balls-in-play by pitch location.
  4. How does a batter’s sweet spot vary by the arm of the pitcher?

R Code

A function heat_plot that will produce this type of heat graph can be found at my Github gist site.  The inputs are the name of the batter, the data frame containing the pitchRx data, and an indicator if you want to see probability of a hit or the probability of a home run.

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: