Plotting pitches – ggplot2 tips and tricks

In Chapter 6 of our book we presented a step-by-step guide for producing plots with the powerful `ggplot2` package.

While you can get a close-to-finished `ggplot2` plot without much hassle, sometimes the finishing touch may require more time than one would expect (and even get you some frustrations when you don’t understand why an error occurs or something isn’t shown as expected).

Here I will start showing a few tips and tricks to finish up your `ggplot2` figures.

Let’s first get us some data to plot. I will make examples using pitches thrown by Clayton Kershaw on Opening Day 2013, when he completed a 94-pitches shutout.
Just like Jim did in the previous post, I’ll make use of the `pitchRx` package to get the desired data. Notice that data are grabbed as strings, thus conversions to numeric are required for some columns before plotting.

```library(pitchRx)

# get data from Apr. 1, 2013
dat = scrapeFX(start="2013-04-01", end="2013-04-01")

pitches = plyr::join(dat\$pitch, dat\$atbat,
by = c("num", "url"), type = "inner")

# Clayton Kershaw's pitches
kershaw = subset(pitches, pitcher_name == "Clayton Kershaw")

# convert characters to numbers
kershaw\$px = as.numeric(kershaw\$px)
kershaw\$pz = as.numeric(kershaw\$pz)
kershaw\$start_speed = as.numeric(kershaw\$start_speed)
```

As I said, you can get your first plot very easily.

```library(ggplot2)

# basic example
ggplot(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
geom_point()
```

With the code above, we indicate that `kershaw` is the data frame to be used, that the variables `px` and `pz` are to be mapped to the x-axis and y-axis respectively, then `type` will affect shape and `pitch_type` color. Here’s the result.

click for full size

By just adding a couple of lines of code, we can have side-by-side plots with data split by opponent handedness, and equal coordinates since `px` and `pz` are expressed in the same unit.

```# add faceting, equal coordinates
ggplot(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
geom_point() +
facet_grid(. ~ stand) +
coord_equal()
```

click for full size

Now, like we did in the example in the book, we’d like to add a strike zone box. As a first step we prepare a data frame containing the coordinates needed to draw the path.

```topKzone = 3.5
botKzone = 1.6
inKzone = -.95
outKzone = 0.95
kZone = data.frame(
x = c(inKzone, inKzone, outKzone, outKzone, inKzone)
, y = c(botKzone, topKzone, topKzone, botKzone, botKzone)
)
```

Unfortunately, when we add the `geom_path` call, we get an error and no strike zone is plotted.

```ggplot(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
geom_point() +
facet_grid(. ~ stand) +
coord_equal() +
geom_path(aes(x, y), data = kZone)
```

`Error in eval(expr, envir, enclos) : object 'type' not found`

The reason for this error is that on the main `ggplot` call (first line) we passed aesthetics for shape and color, thus such aesthetics are expected in every subsequent call, included the `geom_path` we just added.
To avoid this problem, we do not pass any argument to the main `ggplot` call, moving everything in the `geom_point` call in the second line.
The following code returns the expected result.

```ggplot() +
geom_point(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
facet_grid(. ~ stand) +
coord_equal() +
geom_path(aes(x, y), data = kZone)
```

click for full size

In the next steps I’ll show how to make small changes to finalize the plot.
First we store the plot as was created in the last code chunk in an R object named `p0`.

```p0 = ggplot() +
geom_point(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
facet_grid(. ~ stand) +
coord_equal() +
geom_path(aes(x, y), data = kZone)
```

Then we begin by modifying the axis labels, which is pretty straightforward.

```p1 = p0 +
xlab("horizontal location\n(ft. from center of the plate)") +
ylab("vertical location\n(ft. from ground)")
p1
```

click for full size

As a second step we modify the color legend: the legend title is changed to “pitch type” (`name` argument), the pitch type abbreviations are substituted with full names (`labels` argument). Also different colors are chosen (`values` argument).
Note that, since we acted on the color legend, we called the `scale_color_manual` function.

```p2 = p1 +
scale_color_manual(name="pitch type"
, values=c("CU"="blue", "FF"="red", "SL"="black")
, labels=c("CU"="Curveball", "FF"="4-seam Fastball", "SL"="Slider"))
p2
```

Similarly, the shape part of the legend can be modified as follows.

```p3 = p2 +
scale_shape_manual(name="outcome"
, values=c("B"=0, "S"=1, "X"=4)
, labels=c("B"="ball", "S"="strike", "X"="ball in play"))
p3
```

The plot below is the results of the changes apported to both the color and shape legends.

click for full size

Let’s say we want to add an entry to the legend to tell people looking at the plot that the box indicates the rulebook strike zone. In order to have something shown in a `ggplot2` legend, it has to be passed as an aesthetic: that’s what happens in the plot shown above with the pitch type (passed as the color aesthetic) and outcome (passed as the shape aesthetic).

To obtain what we are looking for, we slightly change the `p0` object.

```p0 = ggplot() +
geom_point(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
facet_grid(. ~ stand) +
coord_equal() +
geom_path(aes(x, y, linetype="kzone"), data = kZone)
```

Note that, in this case, we passed a constant string to the `linetype` aesthetic.
Let’s now re-run the code to obtain `p3`.

```p1 = p0 +
xlab("horizontal location\n(ft. from center of the plate)") +
ylab("vertical location\n(ft. from ground)")

p2 = p1 +
scale_color_manual(name="pitch type"
, values=c("CU"="blue", "FF"="red", "SL"="black")
, labels=c("CU"="Curveball", "FF"="4-seam Fastball", "SL"="Slider"))

p3 = p2 +
scale_shape_manual(name="outcome"
, values=c("B"=0, "S"=1, "X"=4)
, labels=c("B"="ball", "S"="strike", "X"="ball in play"))
```

And then we finish up the line type entry in the legend using commands analogous to the ones used for shape and color.

```p4 = p3 + scale_linetype_manual(name=""
, values=c("kzone"=2)
, labels=c("kzone"="rulebook K-zone"))
p4
```

click for full size

As a final touch to the plot, we want to modify the labels on the strips, changing “L” and “R” with “vs. LHB” and “vs. RHB” respectively.
The most straightforward way to achieve that is to simply modify your data frame, so that you have the proper labels in the `stand` column (or in a new one).
However, if you have reasons for not wanting to mess with your data frame, here’s a way (albeit a bit awkward) to obtain the desired result.

Three steps need to be performed.
First, a list has to be built, which maps values in the data frame to desired labels.

```opp_hand <- list("L" = "vs. LHB", "R" = "vs. RHB")
```

Then the following “labeller” function is created

```opp_hand_labeller <- function(variable,value){
return(opp_hand[value])
}
```

Finally, the function above is passed to the `labeller` argument in the `facet_grid` call.

```p0 <- ggplot() +
geom_point(data=kershaw, aes(x=px, y=pz, shape=type, col=pitch_type)) +
facet_grid(. ~ stand, labeller=opp_hand_labeller) +
coord_equal() +
geom_path(aes(x, y, linetype="kzone"), data = kZone)
```

Then it’s just a matter of re-running the code previously shown until you have the new version of the `p4` object.

```p1 = p0 +
xlab("horizontal location\n(ft. from center of the plate)") +
ylab("vertical location\n(ft. from ground)")

p2 = p1 +
scale_color_manual(name="pitch type"
, values=c("CU"="blue", "FF"="red", "SL"="black")
, labels=c("CU"="Curveball", "FF"="4-seam Fastball", "SL"="Slider"))

p3 = p2 +
scale_shape_manual(name="outcome"
, values=c("B"=0, "S"=1, "X"=4)
, labels=c("B"="ball", "S"="strike", "X"="ball in play"))

p4 = p3 + scale_linetype_manual(name=""
, values=c("kzone"=2)
, labels=c("kzone"="rulebook K-zone"))
p4
```

click for full size

8 responses

1. Just found out about your book and ordered it through the publisher. I’m huge into sabermetrics and am tech savvy, yet R has given me so much aggravation. I’m hopeful the text allows me to really understand the basics of R and the installation of packages and datasets that I can use on a regular basis. I use analytics regularly in my baseball writing, so this is great news. I’m excited.

Also,I believe Albert was a Temple grad … fellow Owl here, too.

2. We hope you find the book helpful. I understand that R has a steep learning curve ( I work with a lot of students), but I think once you get a certain comfort level, it goes pretty well. By the way, although I grew up in Philly, Jay Bennett (my Curve Ball coauthor) is the Temple grad.

3. That’s right, it was Jay. Curve Ball is one those books in my library I refer to almost weekly. Where in Philly did you grow up, Jay? I was down in Juniata and went to North Catholic …

1. I grew up in Lafayette Hill (Whitemarsh). I still make regular visits to Philly to see my mom. I went to graduate school in Indiana and didn’t make it make to PA. Thanks for the comments about Curve Ball.

4. Hi Jim and Max,

I’m loving these posts! I think this post in particular is a great way to learn some of the quirks of ggplot2. I just wanted to note that pitchRx::strikeFX() has support for automatically drawing strike-zones as described by Mike Fast:

http://www.baseballprospectus.com/article.php?articleid=14572

It’s also worth noting you can always use “ggplot arithmetic” with strikeFX if you want to add layers or customize the output. For example,

library(pitchRx)
data(pitches)
strikeFX(pitches)+facet_grid(.~stand)+theme_bw()

That said, I’m most excited about the model option of strikeFX – which is still under development. The basic idea is to ease the process behind visualizing a Generalized Additive Model (fit using the mgcv package). The newest version of pitchRx has some examples on the strikeFX help page. If you have any questions or feedback, please let me know!

5. This might be a dumb question, but are these charts from the catcher’s perspective? In general, are the px and pz values from the catcher’s or pitcher’s perspective? As in negative px values would be inside to RHB?

Thanks!

Isaac

6. It’s not a dumb question Isaac: one should make charts self-explanatory and I didn’t!
Yes the charts are from the catcher’s perspective, and usually (but not necessarily) that’s how they are shown. And per PITCHf/x, negative px is inside to RHB.

7. […] Now it’s time to use R’s graphic engine, with the help of ggplot2 package (see some ggplot2 tricks here). […]