Stealing 2nd Base: a Tribute to the Royals

Last night, I watched the Kansas City Royals finish their sweep of the Los Angeles Angels. One of the interesting aspects of the Royals was their propensity for stealing bases. This inspired me to explore stolen bases, or more accurately stolen base attempts of 2nd base.

We’ll use Retrosheet play-by-play from the 2013 season to get answers to the following questions:

  • How did teams differ in SB attempts and their success rates?
  • When do teams attempt stolen bases? What innings and how many outs?
  • We know pitchers differ in their tendency to allow stolen bases? In the 2013, which pitchers led in SB attempts, and what were the SB success rates for these pitchers?

I’ve described the process of downloading Retrosheet play-by-play data here. In the below code, I have the Retrosheet data in the file “all2013.csv”, a file containing the headers in “fields.csv” and a rosters file “roster2013.csv”.

I read these files for the 2013 season in R.

all2013 <- read.csv("~/Desktop/PGP Folder/download.folder/unzipped/all2013.csv", 
                    header=FALSE)
fields <- read.csv("~/Desktop/PGP Folder/download.folder/unzipped/fields.csv")
names(all2013) <- fields$Header
roster2013 <- read.csv("~/Desktop/PGP Folder/download.folder/unzipped/roster2013.csv")

We will focus of steals of second base. The relevant variables are RUNS1_SB_FL and RUNS1_CS.FL . We create a new data frame stealing.first that only considers these events.

library(dplyr)
stealing.first <- filter(all2013, RUN1_SB_FL == TRUE |
                           RUN1_CS_FL == TRUE )

First, we are interested in seeing how the number of stolen base attempts and success rates vary by team. We create a new variable BAT_TEAM_ID that is the identity of the team who is batting and trying to steal the base.

stealing.first <- mutate(stealing.first,
                       HOME_TEAM_ID=substr(GAME_ID, 1, 3))
stealing.first <- mutate(stealing.first,
                       BAT_TEAM_ID=ifelse(BAT_HOME_ID==1,
                                  as.character(HOME_TEAM_ID), 
                                  as.character(AWAY_TEAM_ID)))

We compute the number of stolen bases and caught stealing for all 30 teams and place these in the data frame team.stealing .
From this data frame, we compute the number of attempts and the success rate.

team.stealing <- summarize(group_by(stealing.first, BAT_TEAM_ID),
                           SB=sum(RUN1_SB_FL==TRUE),
                           CS=sum(RUN1_CS_FL==TRUE))
team.stealing <- mutate(team.stealing,
                        Success.Rate = SB / (SB + CS),
                        Attempts = SB + CS)

We construct a scatterplot of the attempts and success rates for all teams. Note that teams that tend to steal more bases also tend to be more successful. The variability both in SB attempts and success rates is remarkable. Clearly, teams place different values on stolen bases, and I suppose that teams have different “speed” players and coaching expertise in how to steal bases.

library(ggplot2)
ggplot(team.stealing, aes(Attempts, Success.Rate, label=BAT_TEAM_ID)) + 
  geom_point() + geom_smooth(method="lm")+ geom_text() +
  labs(title="1B Steal Attempts and Success Rates for All 2013 Teams")

stealing1

Net we look at stealing of second base during different out situations. I use the summarize and mutate functions to break down attempts and success rate by the number of outs. Note that it is most likely to steal 2nd base on two outs. Also, runners are most successful when there are two outs.

outs.stealing <- summarize(group_by(stealing.first, OUTS_CT),
                           SB=sum(RUN1_SB_FL==TRUE),
                           CS=sum(RUN1_CS_FL==TRUE))
outs.stealing <- mutate(outs.stealing,
                        Success.Rate = SB / (SB + CS),
                        Attempts = SB + CS)
outs.stealing                   
## Source: local data frame [3 x 5]
## 
##   OUTS_CT  SB  CS Success.Rate Attempts
## 1       0 559 244       0.6961      803
## 2       1 785 359       0.6862     1144
## 3       2 976 291       0.7703     1267

To explore the patterns by inning, we do a similar breakdown by the INN_CT variable. Runners tend not to be successful in stealing 2nd base in the 2nd and 4th innings,
and they are more successful in late innings.

inning.stealing <- summarize(group_by(stealing.first, INN_CT),
                           SB=sum(RUN1_SB_FL==TRUE),
                           CS=sum(RUN1_CS_FL==TRUE))

inning.stealing <- mutate(inning.stealing, Inning=pmin(INN_CT, 10) )
inning.stealing <- summarize(group_by(inning.stealing, Inning),
                           SB=sum(SB), CS=sum(CS),
                           Success.Rate=SB / (SB + CS))
inning.stealing
## Source: local data frame [10 x 4]
## 
##    Inning  SB  CS Success.Rate
## 1       1 343 129       0.7267
## 2       2 194  88       0.6879
## 3       3 293 115       0.7181
## 4       4 228 109       0.6766
## 5       5 268 108       0.7128
## 6       6 237  95       0.7139
## 7       7 271  90       0.7507
## 8       8 259  86       0.7507
## 9       9 164  60       0.7321
## 10     10  63  14       0.8182
ggplot(inning.stealing, aes(Inning, Success.Rate)) + geom_point(size=4)

stealing2

Although we focus on the players who steal many bases, there are other players involved with SB’s, namely the pitcher and the catcher. Here we briefly look at the pitcher effect. We breakdown stealing by the pitcher id PIT_ID . We sort the pitcher data frame by the number of attempts and display the top 10 with respect to SB attempts. We merge this data frame with the roster information so we can display first and last names.

stealing.pitcher <- summarize(group_by(stealing.first, PIT_ID),
                    SB=sum(RUN1_SB_FL==TRUE), 
                    CS=sum(RUN1_CS_FL==TRUE),
                    Attempts=SB + CS,
                    Success.Rate=SB/Attempts)
stealing.pitcher <- merge(stealing.pitcher, 
                          roster2013,
                          by.x="PIT_ID", by.y="Player.ID")
stealing.pitcher <- arrange(stealing.pitcher, Attempts)
stealing.pitcher <- 
   stealing.pitcher[!duplicated(stealing.pitcher$PIT_ID), ]
N <- dim(stealing.pitcher)[1]
stealing.pitcher[(N - 9) : N, 
                 c("First.Name", "Last.Name", "SB", "CS",
                   "Attempts", "Success.Rate")]
##     First.Name Last.Name SB CS Attempts Success.Rate
## 591      Ervin   Santana 13  8       21       0.6190
## 592         Yu   Darvish 15  7       22       0.6818
## 593      Felix Hernandez 17  5       22       0.7727
## 594        Tim  Lincecum 21  2       23       0.9130
## 595    Edinson   Volquez 21  2       23       0.9130
## 597     Justin Verlander 20  4       24       0.8333
## 598     Anibal   Sanchez 24  1       25       0.9600
## 599       Cole    Hamels 17  9       26       0.6538
## 600      Scott   Feldman 24  3       27       0.8889
## 602       John    Lackey 32  5       37       0.8649

John Lackey was by far the leader in SB attempts of 2nd base at 37 and runners were pretty successful with a rate of 86 percent. Scanning over the list, Anibal Sanchez was pretty poor in preventing SB (success rate of 96 percent), and Cole Hamels was pretty good in SB prevention (65 %).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: