Monthly Archives: May, 2015

Pitch Location Density Estimation with smoothScatter

Last week, I used some data from Joe West to evaluate strike call probabilities when he is behind the plate. This week, I’d like to address a different issue with the same data: density estimation. Carson dealt with density estimation a couple weeks ago, estimating kernel density of a single variable–break angle–for two different pitch types. Today, I’ll do this with location on the horizontal/vertical axis of the strike zone. One way this can come in handy is to understand if pitchers throw their pitches in different locations depending on the umpires’ tendencies. We can get to that in a later post, and for now we’ll just look at pitch location for Joe West. Let’s begin by loading up the same data I used last time. Here, we will keep ALL pitches thrown when West is behind the plate, rather than just the pitches he called.

setwd("c:/...")
pitches <- read.csv(file="joeWest.csv", h=T)
head(pitches)

###load color library
library(RColorBrewer)

Overall, kernel density estimation is used to get a smooth representation of the distribution of the data. But it can also be useful in visualization when many points overlap with one another as you can see below in the scatter plot of the pitch locations when Joe West is behind the plate. Note that in these plots, I indicate the ground using the horizontal line at zero. Any pitch below this point hit the ground prior to passing the front of home plate.

###define rulebook average strike zone
zBot <- 1.52
zTop <- 3.42
zW <- 0.83

###make scatter plot of pitch locations
png(file="WestScatter.png", height=450, width=828)
par(mfrow=c(1,2))
plot(pz ~ px, data=subset(pitches, pitches$stand=="R"), main="Joe West Pitch Location (RHB)", xlab="Horizontal Location (ft., Umpire's View)",
    ylab="Vertical Location (ft.)", xlim=c(-4,4), ylim=c(-2,8))
rect(-zW, zBot, zW, zTop, border="green", lty="dotted", lwd=3)
abline(h=0, col="green", lwd=3)

plot(pz ~ px, data=subset(pitches, pitches$stand=="L"), main="Joe West Pitch Location (LHB)", xlab="Horizontal Location (ft., Umpire's View)",
    ylab="Vertical Location (ft.)", xlim=c(-4,4), ylim=c(-2,8))
rect(-zW, zBot, zW, zTop, border="green", lty="dotted", lwd=3)
abline(h=0, col="green", lwd=3)
dev.off()

WestScatter

While we can see that left handed batters tend to see more pitches on the outside half and beyond, everything sort of just looks like a big black blob here. Or at least it looks that way. I’ll return to this later. We can remedy our overlapped blobs with the use of color as a representation of the frequency with which pitches are thrown to a given location.

Let’s now take a look at vertical and horizontal location densities separately. For this, we’ll use the function density and plot right and left handed batters on the same plot for each of the two measures. I am sticking to the defaults here for simplicity, but depending on your data, you may want to learn a bit more about the bandwidth selection issues that can happen with our density estimation. Note that in Carson’s last post (linked above), you could also do this using ggplot2. You can see below that RHB have their pitch distribution shifted outside (to the right of the plot), as do left handed batters (to the left of the plot). But the vertical location is, on average, much more similar across RHB and LHB.

png(file="WestDensity.png", height=500, width=950)
par(mfrow=c(1,2))
plot(density(pitches$px[pitches$stand=="R"], na.rm=T), lwd=3, col="darkred", xlim=c(-4,4),
    main="Horizontal Location Density", xlab="Horizontal Location (ft., Umpire's View)", ylab="Density")
lines(density(pitches$px[pitches$stand=="L"], na.rm=T), lwd=3, col="darkblue")
legend(-4, 0.4, c("RHB", "LHB"), col=c("darkred", "darkblue"), lty=c("solid", "solid"), lwd=c(3, 3), bty="n", cex=1)

plot(density(pitches$pz[pitches$stand=="R"], na.rm=T), lwd=3, col="darkred", xlim=c(-2,8),
    main="Vertical Location Density", xlab="Vertical Location (ft.)", ylab="Density")
lines(density(pitches$pz[pitches$stand=="L"], na.rm=T), lwd=3, col="darkblue")
legend(-2, 0.4, c("RHB", "LHB"), col=c("darkred", "darkblue"), lty=c("solid", "solid"), lwd=c(3, 3), bty="n", cex=1)
dev.off()

WestDensity

Now let’s move on to looking at density in a two-dimensional space. There are a few ways to do this, but today I’ll focus specifically on the function in R called smoothScatter.This function is in the base graphics package for R, and its advantage is in its simplicity by plotting in a single line of code. There are drawbacks, however, as there are some wonky things that go on with the coloring (I cover that here on my old blog, and most of the solutions are somewhat unsatisfactory). I’ll forge on, nonetheless.

Here, we want to first identify a nice color palette for use in what will ultimately be a heat map. The one I specify below probably isn’t the best for those with color blindness. That’s always something to think about when designing visualizations with color. I covered that on my old blog as well here. R Color Brewer (RColorBrewer) is a nice package, and I encourage you to explore it on your own.

###load color library
library(RColorBrewer)

###make heat map colors
brewer.pal(11, "RdYlBu")
buylrd <- rev(brewer.pal(11,"RdYlBu")

png(file="WestSmoothRHB.png", height=550, width=450)
smoothScatter(pitches$pz[pitches$stand=="R"] ~ pitches$px[pitches$stand=="R"], nbin=1000, colramp = colorRampPalette(c(buylrd)),
    nrpoints=Inf, pch="", cex=.7, transformation = function(x) x^.6, col="black",
    main="Joe West RHB Pitch Location", xlab="Horizontal Location", ylab="Vertical Location", xlim=c(-4,4), ylim=c(-2,8))
rect(-zW, zBot, zW, zTop, border="black", lty="dotted", lwd=3)
abline(h=0, lwd=3)
dev.off()

png(file="WestSmoothLHB.png", height=550, width=450)
smoothScatter(pitches$pz[pitches$stand=="L"] ~ pitches$px[pitches$stand=="L"], nbin=1000, colramp = colorRampPalette(c(buylrd)),
    nrpoints=Inf, pch="", cex=.7, transformation = function(x) x^.6, col="black",
    main="Joe West LHB Pitch Location", xlab="Horizontal Location", ylab="Vertical Location", xlim=c(-4,4), ylim=c(-2,8))
rect(-zW, zBot, zW, zTop, border="black", lty="dotted", lwd=3)
abline(h=0, lwd=3)
dev.off()

WestSmoothLHBWestSmoothRHB

In this representation, the darker the red the more often pitches are thrown that location. And the darker the blue, the less often they’re thrown to that location. We can see pretty clearly here that for both lefties and righties, pitchers tend to throw on the outside half of the plate. This confirms what we saw in our density plots earlier, where the peak of the distribution was shifted one way or the other for RHB  and LHB, respectively. Because of the overlap in the scatterplot, however, we may not have realized there were many more pitches on the outside half of the plate for right handed batters. This is the virtue of being able to do this density estimation and enhance our perception of the behavior of our data.

So there you have it. A nice little function that gives us a nice color representation of pitch location density for Joe West.