Chance of a Hit During Different Pitch Counts

I recently received an email from a high school baseball coach from California who asks “I am looking for a data set that would give me players averages on balls put in play on specific counts, such as 0-2, 2-0, 0-0, etc. I am very curious as to see the distribution of this statistic and how consistent it is with the different teams and even individual players.”

These kind of breakdowns are straightforward to produce using Retrosheet play-by-play files. Here I use data from the 2013 season to construct a table of the different batting events by the pitch count. I use this table to demonstrate that it is easier to get a baseball hit during hitter’s counts.

In an earlier post, I describe the process of downloading Retrosheet data and creating a dataset that is easily readable in R. (If you don’t want to go through this downloading operation, play-by-play data from the 2011 season is available at the book github site.) Let's assume that I've already downloaded the files. I define the file location of the play-by-play data for the 2013 season, and also the “fields.csv” file that provides the names of the variables. I read these files into R.

pbp.file.name <- 
"/Users/albert/Desktop/PGP Folder/download.folder/unzipped/all2013.csv"
fields.name <- 
"/Users/albert/Desktop/PGP Folder/download.folder/unzipped/fields.csv"
pbp <- read.csv(pbp.file.name, header=FALSE)
pbp.names <- read.csv(fields.name)
names(pbp) <- pbp.names$Header

Since we want to restrict attention to batting plays in our breakdown, we use the subset function with the batting event flag equal to TRUE.

pbp.bat <- subset(pbp, BAT_EVENT_FL==TRUE)

It is convenient to create a balls and strikes variable The.Count by pasting the number of balls and number of strikes variables.

pbp.bat$The.Count <- with(pbp.bat,
                paste(BALLS_CT, "-", STRIKES_CT, sep=""))

Using the recode function in the car package, I recode the EVENT_CD variable to meaningful labels.

library(car)
pbp.bat$Event.Code <- recode(pbp.bat$EVENT_CD,
      "2='Out'; 3='SO'; 14='BB'; 15='IBB'; 16='HBP'; 17='Interf';
       18='E'; 19='FC'; 20='1B'; 21='2B'; 22='3B'; 23='HR'")

I create a two-way table of counts of the plate appearance event and count.

T.Event.Count <- with(pbp.bat, table(Event.Code, The.Count))
options(width=90)
T.Event.Count
##           The.Count
## Event.Code   0-0   0-1   0-2   1-0   1-1   1-2   2-0   2-1   2-2   3-0   3-1   3-2
##     1B      4283  3603  1793  2643  3535  3239   896  2089  3158    50   774  2375
##     2B      1270   942   429   848  1037   773   327   683   858    25   279   751
##     3B       133    71    50    82    73    63    29    69    93     3    24    82
##     BB         0     0     0     0     0     0     0     0     1  2495  4133  6993
##     E        223   184   105   136   186   178    55    79   182     3    34   108
##     FC        99    64    37    56    54    42    19    29    40     2    12    25
##     HBP      279   243   175    74   147   263    27    55   195     6    14    58
##     HR       800   450   160   554   521   354   228   440   453    25   252   424
##     IBB        0     0     0     0     0     0     0     0     0   970    46     2
##     Interf     0     2     5     0     2     5     0     1     4     0     0     6
##     Out    13413 11353  5539  8362 10499  9956  3054  6068  9855   194  2565  7059
##     SO         0     0  7974     0     1 12260     0     0 10684     0     0  5791

Using this table, it is straightforward to compute a batting average or any other hitting measure for each of the pitch counts. Here I want to show that it is easier to hit home runs and doubles in “hitter's counts” such as 3-1 and 2-0, than “pitcher's counts” such as 0-2 and 1-2. The following function plot.event will compute the conditional probability of the play event for each of the possible counts, and display these probabilities using a dot chart.

plot.event <- function(T.Event.Count, event){
  P.Event <- prop.table(T.Event.Count, 2)
  P <- P.Event[event, ]
  D <- data.frame(Count=names(P), Probability=P)
  require(ggplot2)
  ggplot(D, aes(Count, Probability)) + geom_point(size=4) +
    labs(title=paste("Probability of", event,
                     "for Each Possible Count")) +
    theme(plot.title = element_text(size = rel(2))) +
    theme(axis.text = element_text(size = rel(2))) +
    theme(axis.title = element_text(size = rel(2)))
}

I illustrate using this function for the events HR, 2B, and 1B. Clearly, it is much easier to get each of these types of hits during particular pitch counts.

plot.event(T.Event.Count, "HR")

hr.count

plot.event(T.Event.Count, "2B")

double.count

plot.event(T.Event.Count, "1B")

single.count

Follow

Get every new post delivered to your Inbox.

Join 37 other followers