Clutch performers in 2013

In the last post, I illustrated reading into R the 2013 Retrosheet play-by-play data. Also, I illustrated computing the run values of all plays using a function version of the R code from our book. Here we use this data to find the best clutch performers in the 2013 season.

We have a data frame d2013 containing all of the plays. We use the subset function to restrict attention to plays where there was a batting event (excluding events like attempted steals).

d2013 <- subset(d2013, BAT_EVENT_FL == TRUE)

In my previous function, we added a new variable STATE which gives the current runners on base and the number of outs. We define a new variable Scoring.Position which is “yes” if there are runners in scoring position and “no” otherwise.

d2013$Scoring.Position <- with(d2013,
      ifelse(STATE=="010 0" | STATE=="010 1" | STATE=="010 2" |
            STATE=="011 0" | STATE=="011 1" | STATE=="011 2" |
            STATE=="110 0" | STATE=="110 1" | STATE=="110 2" |
            STATE=="101 0" | STATE=="101 1" | STATE=="101 2" |
            STATE=="001 0" | STATE=="001 1" | STATE=="001 2" |
            STATE=="111 0" | STATE=="111 1" | STATE=="111 2",
            "yes", "no")

For each batter, we want to compute the number of plate appearances and the mean runs value for batting plays when runners in scoring position, and for other plays. This is conveniently done using the new dplyr package.

RUNS.VALUE <- summarise(group_by(d2013, BAT_ID, Scoring.Position), 
                  PA = n(),
                  meanRUNS = mean(RUNS.VALUE))

Next, we use several applications of subset and merge to create a new data frame RUNSsituation . A given row will contain the PA and means runs for a given batter when runners are in SP and not-SP situations. We only consider hitters who have 100 PA’s in each situation.

RUNS.VALUE1 <- subset(RUNS.VALUE, PA >= 100)
RUNS.SP <- subset(RUNS.VALUE1, Scoring.Position=="yes")
RUNS.NSP <- subset(RUNS.VALUE1, Scoring.Position=="no")
RUNSsituation <- merge(RUNS.SP, RUNS.NSP, by="BAT_ID")

We compute the Mean runs value and the Difference , the difference between the mean runs values in scoring position and non-scoring position situations.

RUNSsituation$Mean <- with(RUNSsituation,
           (PA.x * meanRUNS.x + PA.y * meanRUNS.y) / (PA.x + PA.y))
RUNSsituation$Difference <- with(RUNSsituation,
            meanRUNS.x - meanRUNS.y)

The ggplot2 package is used to plot the mean (that we call Performance) against the difference (that we call Clutch). I plot abbreviated player codes so we can easily identify hitters.

  aes(Mean, Difference, label=substr(BAT_ID, 1, 4))) + 
  geom_text(color="blue")  + 
  geom_hline(yintercept=0, color="red") + 
  geom_vline(xintercept=0, color="red") +
  xlab("PERFORMANCE") + ylab("CLUTCH")


From the plot we see that Miguel Cabrera and Chris Davis had the highest mean performances and Freddie Freeman and Allen Craig had the best clutch performances using our definition of clutch. B.J. Upton was one of the weakest performers (from a runs value perspective) and also was the worst clutch performers using this measure. What is interesting is that there is a pretty strong positive relationship between performance and clutch. So the best clutch performers tend to be the better hitters. So maybe in our search for clutch players one needs to adjust for level of performance.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: