Let’s return to my previous post where we were talking about clutch hitting — that is, the ability of a team to score runners who are in scoring position.

The question we want to focus on is this: do teams really differ in their ability to advance runners from scoring position to home?

Here’s the R work, reviewing what I did in the previous post.

Load in the Retrosheet play-by-play data for the 2013 season. Since we want to consider plays at the beginning of the inning – we use the ` subset`

function with the condition that ` LEADOFF_FL = TRUE `

.

load("pbp2013.Rdata") d.inning <- subset(d2013, LEADOFF_FL==TRUE)

As before, we create a data set with two variables: `RSP`

= number of runs in scoring position and `RUNS`

= the number of runners in scoring position who actually score. Since we are focusing on the process of advancing runners to home, we consider innings where there is at least

one runner in scoring position.

library(dplyr) S <- summarize(group_by(d2013, BAT_TEAM, HALF.INNING), RSP=length(unique(c(as.character(BASE2_RUN_ID), as.character(BASE3_RUN_ID)))) - 1, RUNS=sum(RUN2_DEST_ID >= 4) + sum(RUN3_DEST_ID >= 4)) S$cruns <- cut(S$RUNS, breaks=c(-.5, .5, 1.5, 2.5, 3.5, 1000), labels=c("0 Runs", "1 Run", "2 Runs", "3 Runs", "4+ Runs")) S.RSP <- subset(S, RSP >= 1)

As before, we tabulate runners in scoring position and runs scored.

with(S.RSP, table(RSP, cruns))

## cruns ## RSP 0 Runs 1 Run 2 Runs 3 Runs 4+ Runs ## 1 8817 1908 0 0 0 ## 2 1662 2527 566 0 0 ## 3 80 723 928 199 0 ## 4 4 47 344 378 64 ## 5 0 1 31 123 144 ## 6 0 0 2 5 115 ## 7 0 0 0 0 54 ## 8 0 0 0 0 13 ## 9 0 0 0 0 2

A simple way to summarize the relationship in the table is by fitting an ordinal logistic regression model. One can write this model as

where is the number of runs scored, is the number of runners in scoring position, tells us how much the logit of the probability (the left hand side of the equation) changes as you have one more runner in scoring position.

This model is easy to fit using the `clm`

function in the `ordinal`

package. I show the estimate of and the associated standard error.

library(ordinal) fit <- clm(cruns ~ RSP, data=S.RSP) c(beta=fit$coef[5], se=sqrt(vcov(fit)[5, 5]))

## beta.RSP se ## 2.39485 0.02485

Since it is easy to think about this fitted model on the probability scale, I display the fitted probabilities of scoring 1+ runs, 2+ runs, 3+ runs, and 4+ runs as a function of the number of runners in scoring position. Notice I use the inverse logit function ` invlogit`

(in the `arm`

package) to convert to a probability scale.

library(arm) beta <- fit$coef[5] gam <- fit$coef[1:4] curve(invlogit(-gam[1] + beta * x), 1, 4, ylim=c(0, 1), xlab="RUNNERS IN SCORING POSITION", ylab="PROBABILITY") for(j in 2:4) curve(invlogit(-gam[j] + beta * x), add = TRUE) text(c(2, 2.5, 3, 3.5), c(.7, .25, .15, .05), c("1+ Runs", "2+ Runs", "3+ Runs", "4+ Runs"))

This model fit gives us a general idea about clutch hitting — how the number of runners in scoring position translates to runs scored.

The interesting question is **how do teams differ in clutch hitting**?

We address this question by fitting this ordinal regression model separately to each team. We write a short function `one.fit`

which does this for one team, and then the useful function `sapply`

applies this function for all teams.

one.fit <- function(team, d, covariate){ fit <- clm(as.formula(paste("cruns ~", covariate)), data=subset(d, BAT_TEAM==team)) b <- coef(fit)[5] se <- sqrt(vcov(fit)[5, 5]) return(c(b, se))} TEAMS <- unique(as.character(S.RSP$BAT_TEAM)) estimates <- sapply(TEAMS, one.fit, S.RSP, "RSP") round(estimates, 2)

## ANA ARI ATL BAL BOS CHA CHN CIN CLE COL DET HOU KCA LAN MIA RSP 2.19 2.56 2.51 2.46 2.28 2.50 2.33 2.40 2.34 2.50 2.29 2.39 2.39 2.33 2.55 0.12 0.14 0.15 0.14 0.12 0.15 0.14 0.13 0.13 0.14 0.12 0.14 0.14 0.13 0.15 MIL MIN NYA NYN OAK PHI PIT SDN SEA SFN SLN TBA TEX TOR WAS RSP 2.51 2.38 2.39 2.60 2.33 2.54 2.34 2.33 2.29 2.35 2.51 2.38 2.54 2.31 2.48 0.14 0.14 0.14 0.15 0.14 0.15 0.14 0.13 0.14 0.13 0.14 0.13 0.14 0.13 0.14

Looking at the estimates, one might think that teams differ in clutch hitting ability. For example, the Mets have a large estimate of 2.60 which indicates they were strong in advancing runners in 2013 and Anaheim with an estimate of 2.19 appears to be poor in clutch hitting. But you have to look at these estimates relative to the standard errors.

Once we have computed a “clutch” estimate , our knowledge about the true clutch ability can be described by a normal curve with mean and standard deviation , where is the standard error of the estimate .

We plot all of these normal curves for the team clutch abilities on the same graph.

curve(dnorm(x, estimates[1, 1], estimates[2, 1]), 1.5, 3.5, ylim=c(0, 4), xlab="Beta", ylab="Posterior Density", main="Team Estimates of Clutch Hitting") for(j in 2:30) curve(dnorm(x, estimates[1, j], estimates[2, j]), add=TRUE)

What do we see? The bottom line is that there is a lot of overlap in these curves. This means there is little statistical evidence that teams actually have different abilities to advance runners in scoring position. Teams do differ in their abilities to get on-base, and in their abilities to get extra-base hits, but not in their abilities to perform well or badly in clutch situations.