Spray Charts Using the sportyR package


Over the years I have illustrated the use of several R packages to facilitate the collection and exploration of baseball data. Several notable ones are the Lahman package for obtaining season-to-season data for all of MLB’s history, and the baseballr package for scraping other data sources such as the Statcast data from Baseball Savant. Due to Tom Tango’s tweet, I recently became aware of a new package sportyR written by Ross Drucker that uses ggplot2 syntax to draw playing surfaces for a number of sports including football, basketball, soccer and baseball. I have already discussed constructing spray charts in several older posts. Here I will illustrate using the sportyR package to enhance these baseball spray charts.

Getting Started

The sportyR package is very easy to use. Once you have installed the package from CRAN, to construct a regulation MLB field, the following script loads the package and constructs the MLB playing surface.

geom_baseball(league = "MLB")

Coordinate System

Before one can construct a spray chart, one needs to understand the coordinate system used by this package. The unit is feet and the home plate location is the origin (0, 0). I’ve labeled the coordinates for the four bases below. For example, the coordinates for 2nd base are (0, 126) which indicates that 2nd base is 126 feet from home plate.

Plotting Statcast Batted Ball Data

I’d like to use this playing field as a background for a spray chart of a sample of batted balls. I’ve talked about constructing spray charts using R in several posts — this post provides an introductory discussion and this post describes the construction of an improved spray chart where one shows the “pull” side.

For balls put into play, Statcast provides the location variables hc_x and hc_y, but some reexpression is needed:

mutate(location_x = hc_x - 125.42,
         location_y = 198.27 - hc_y)

This reexpression flips the points around and make the origin home plate. These (location_x, location_y) measurements do not correspond to feet, so an additional scaling value is needed so that the units of the values correspond to feet. After some trial and error, the scaling value of 2.5 seems to work, providing reasonable-looking spray charts.

mutate(location_x = 2.5 * (hc_x - 125.42),
         location_y = 2.5 * (198.27 - hc_y))

An Example

Once one has figured out the coordinate system, then it is straightforward to plot points on this playing field background. The geom_baseball() function creates a ggplot2 object and one can add other plot or textual layers by use of the addition function in ggplot2. As an example, I collected a data frame ff containing information on the ground balls hit by Freddie Freeman for the 2019 season . The input variables are the batted ball location variables location_x, location_y and a character variable H indicating if the BIP resulted in a hit or out. Since we are plotting points on a dark background, one needs to choose plotting colors that are easy to see. By use of the scale_colour_manual() function I decide on letting “yellow” correspond to out and “red” correspond to a hit. I use the ggtitle() to add a descriptive title.

The complete code for constructing this spray chart is shown below. Freddie Freeman is a left-handed hitter and we see from the graph that most of his ground balls are hit to the pull side. I would think that most teams would employ an defensive infield shift when Freeman is at-bat. (Checking with Statcast, I found that 68% of these Freeman ground balls during the 2019 season were fit against an infield shift, 15% were hit against a “strategic” shift, and only 17% were hit against a standard infield fielding alignment.)

geom_baseball(league = "MLB") +
  geom_point(data = ff,
             aes(location_x, location_y,
                 color = H)) +
  scale_colour_manual(values =
                 c("yellow", "red")) +
  ggtitle("Freddie Freeman Ground Balls - 2019") 

Final Comments

  • The sportyR package provides visuals of the playing surfaces for many sports such as basketball, football, soccer and hockey. Location data is either available or will shortly be available for these other sports and so these backgrounds will be useful for constructing location graphs.
  • I have several Shiny apps, functions SprayChart() and SprayCompare(), for constructing spray charts in my ShinyBaseball package. I have already revised this apps to use the playing fielding visual from the sportyR package. For example, here’s a snapshot of the use of the SprayCompare() function to compare the fly ball locations of Mike Trout and Rhys Hoskins for the 2019 season. It appears that Trout is more likely than Hoskins to hit fly balls to the opposite field.

3 responses

  1. Hello, can you share how you defined ‘H’?

    1. Robert, H is an indicator for a hit — 1 is a hit and 0 is an out. Jim

      1. When mutating H into my data frame I originally did so as numeric, not character. I’m all good now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: