In Chapter 7 of * Analyzing Baseball With R *, we explore balls and strikes effects. Here I provide a preview into this material by focusing on the effect of the first pitch. There are essentially three relevant outcomes of this first pitch — the count goes to 0-1, to 1-0, or the plate appearance is ended either with a hit-by-pitch or a ball put into play. From a runs-value perspective, how do these three outcomes differ?

In a previous post, I described how to download the Retrosheet play-by-play data for a single season, and I presented a R function that will compute the runs value for all plays. We store this data into the data frame ` d2013 `

.

First, we use the ` subset `

function to reduce the data frame to events where there was a batting event.

2013 <- subset(d2013, BAT_EVENT_FL==TRUE)

The variable ` PITCH_SEQ_TX `

gives the pitch sequence for each PA including pickoff throws to bases. The ` gsub `

function is used to remove these non-pitch events and create a new variable ` pseq `

:

d2013$pseq <- gsub("[.>123N+*]", "", d2013$PITCH_SEQ_TX)

We extract the first character of the string which is the outcome of the first pitch.

d2013$First.Pitch <- substr(d2013$pseq, 1, 1)

Based on the value of ` First.Pitch `

, we classify the new variable ` Count `

as either 0-1, 1-0 or End.PA (the plate appearance is over). (The outcome 0-1 means that the PA goes through a 0-1 count.)

d2013$Count <- ifelse(d2013$First.Pitch %in% c("C", "F", "L", "M", "O", "Q", "S", "T"), "0-1", ifelse(d2013$First.Pitch %in% c("B", "I", "P"), "1-0", "End.PA"))

We first find the count, mean, and standard deviation of the runs values for all pitches.

with(d2013, c(N=length(RUNS.VALUE), Mean=mean(RUNS.VALUE), SD=sd(RUNS.VALUE))) N Mean SD 1.848710e+05 -6.455815e-04 4.672549e-01

At the beginning of the PA, the mean runs value is essentially zero. Next we find the same summaries for the three possible outcomes of the first pitch.

library(dplyr) S <- summarize(group_by(d2013, Count), N=length(RUNS.VALUE), Mean=mean(RUNS.VALUE), SD=sd(RUNS.VALUE)) S Count N Mean SD 1 0-1 90969 -0.03875386 0.4407245 2 1-0 73402 0.03405727 0.4746078 3 End.PA 20500 0.04420387 0.5363598

From the viewpoint of runs value, there is approximately a 0.034 – (-0.039) = 0.073 benefit (from the hitter’s perspective) of going 1-0 instead of 0-1.

As you might anticipate, I like to graphically show the first pitch effect. I use the ` ggplot2 `

package to plot “point-range” graphs of the runs values for the three outcomes where I am graphing the mean plus and minus a standard deviation.

library(ggplot2) limits <- aes(ymin=Mean - SD, ymax=Mean + SD) ggplot(S, aes(Count, Mean)) + geom_point(size=6, color="red") + geom_pointrange(limits, color="red", size=1.5) + geom_hline(xintercept=0, color="blue") + labs(title="The First Pitch Effect")+ ylab("Run Value") + xlab("Outcome of First Pitch") + theme(axis.text = element_text(size = rel(2))) + theme(axis.title = element_text(size = rel(2))) + theme(plot.title = element_text(size = rel(2)))

There is much uncertainty about the outcome reflected by the large lengths of these bars. It is interesting that the runs values for “end of PA” is similar to the runs values for “1-0”, with more uncertainty for the “end of PA” outcome.