# Team Clutch Hitting, Part II

Let’s return to my previous post where we were talking about clutch hitting — that is, the ability of a team to score runners who are in scoring position.

The question we want to focus on is this: do teams really differ in their ability to advance runners from scoring position to home?

Here’s the R work, reviewing what I did in the previous post.

Load in the Retrosheet play-by-play data for the 2013 season. Since we want to consider plays at the beginning of the inning – we use the  subset function with the condition that  LEADOFF_FL = TRUE .

load("pbp2013.Rdata")


As before, we create a data set with two variables: RSP = number of runs in scoring position and RUNS = the number of runners in scoring position who actually score. Since we are focusing on the process of advancing runners to home, we consider innings where there is at least
one runner in scoring position.

library(dplyr)
S <- summarize(group_by(d2013, BAT_TEAM, HALF.INNING),
RSP=length(unique(c(as.character(BASE2_RUN_ID),
as.character(BASE3_RUN_ID)))) - 1,
RUNS=sum(RUN2_DEST_ID >= 4) + sum(RUN3_DEST_ID >= 4))

S$cruns <- cut(S$RUNS,
breaks=c(-.5, .5, 1.5, 2.5, 3.5, 1000),
labels=c("0 Runs", "1 Run", "2 Runs",
"3 Runs", "4+ Runs"))
S.RSP <- subset(S, RSP >= 1)


As before, we tabulate runners in scoring position and runs scored.

with(S.RSP, table(RSP, cruns))

##    cruns
## RSP 0 Runs 1 Run 2 Runs 3 Runs 4+ Runs
##   1   8817  1908      0      0       0
##   2   1662  2527    566      0       0
##   3     80   723    928    199       0
##   4      4    47    344    378      64
##   5      0     1     31    123     144
##   6      0     0      2      5     115
##   7      0     0      0      0      54
##   8      0     0      0      0      13
##   9      0     0      0      0       2


A simple way to summarize the relationship in the table is by fitting an ordinal logistic regression model. One can write this model as
$logit(Prob(R \ge c)) = \log\left(\frac{Prob(R \ge c)}{1 - Prob(R \ge c)}\right) = - \gamma_c + \beta x$
where $R$ is the number of runs scored, $x$ is the number of runners in scoring position, $\beta$ tells us how much the logit of the probability (the left hand side of the equation) changes as you have one more runner in scoring position.

This model is easy to fit using the clm function in the ordinal package. I show the estimate of $beta$ and the associated standard error.

library(ordinal)
fit <- clm(cruns ~ RSP, data=S.RSP)
c(beta=fit$coef[5], se=sqrt(vcov(fit)[5, 5]))  ## beta.RSP se ## 2.39485 0.02485  Since it is easy to think about this fitted model on the probability scale, I display the fitted probabilities of scoring 1+ runs, 2+ runs, 3+ runs, and 4+ runs as a function of the number of runners in scoring position. Notice I use the inverse logit function  invlogit (in the arm package) to convert $- \gamma_c + \beta x$ to a probability scale. library(arm) beta <- fit$coef[5]
gam <- fit$coef[1:4] curve(invlogit(-gam[1] + beta * x), 1, 4, ylim=c(0, 1), xlab="RUNNERS IN SCORING POSITION", ylab="PROBABILITY") for(j in 2:4) curve(invlogit(-gam[j] + beta * x), add = TRUE) text(c(2, 2.5, 3, 3.5), c(.7, .25, .15, .05), c("1+ Runs", "2+ Runs", "3+ Runs", "4+ Runs"))  This model fit gives us a general idea about clutch hitting — how the number of runners in scoring position translates to runs scored. The interesting question is how do teams differ in clutch hitting? We address this question by fitting this ordinal regression model separately to each team. We write a short function one.fit which does this for one team, and then the useful function sapply applies this function for all teams. one.fit <- function(team, d, covariate){ fit <- clm(as.formula(paste("cruns ~", covariate)), data=subset(d, BAT_TEAM==team)) b <- coef(fit)[5] se <- sqrt(vcov(fit)[5, 5]) return(c(b, se))} TEAMS <- unique(as.character(S.RSP$BAT_TEAM))
estimates <- sapply(TEAMS, one.fit, S.RSP, "RSP")
round(estimates, 2)

##
ANA  ARI  ATL  BAL  BOS  CHA  CHN  CIN  CLE  COL  DET  HOU  KCA  LAN  MIA
RSP 2.19 2.56 2.51 2.46 2.28 2.50 2.33 2.40 2.34 2.50 2.29 2.39 2.39 2.33 2.55
0.12 0.14 0.15 0.14 0.12 0.15 0.14 0.13 0.13 0.14 0.12 0.14 0.14 0.13 0.15
MIL  MIN  NYA  NYN  OAK  PHI  PIT  SDN  SEA  SFN  SLN  TBA  TEX  TOR  WAS
RSP 2.51 2.38 2.39 2.60 2.33 2.54 2.34 2.33 2.29 2.35 2.51 2.38 2.54 2.31 2.48
0.14 0.14 0.14 0.15 0.14 0.15 0.14 0.13 0.14 0.13 0.14 0.13 0.14 0.13 0.14


Looking at the $\beta$ estimates, one might think that teams differ in clutch hitting ability. For example, the Mets have a large $\beta$ estimate of 2.60 which indicates they were strong in advancing runners in 2013 and Anaheim with an estimate of 2.19 appears to be poor in clutch hitting. But you have to look at these estimates relative to the standard errors.

Once we have computed a “clutch” estimate $\hat\beta$, our knowledge about the true clutch ability $\beta$ can be described by a normal curve with mean $\hat\beta$ and standard deviation $se$, where $se$ is the standard error of the estimate $\hat\beta$.

We plot all of these normal curves for the team clutch abilities $\beta_1,... , \beta_{30}$ on the same graph.

curve(dnorm(x, estimates[1, 1], estimates[2, 1]),
1.5, 3.5, ylim=c(0, 4),
xlab="Beta", ylab="Posterior Density",
main="Team Estimates of Clutch Hitting")
for(j in 2:30)
curve(dnorm(x, estimates[1, j], estimates[2, j]), add=TRUE)


What do we see? The bottom line is that there is a lot of overlap in these curves. This means there is little statistical evidence that teams actually have different abilities to advance runners in scoring position. Teams do differ in their abilities to get on-base, and in their abilities to get extra-base hits, but not in their abilities to perform well or badly in clutch situations.