# Chance of a Hit During Different Pitch Counts

I recently received an email from a high school baseball coach from California who asks “I am looking for a data set that would give me players averages on balls put in play on specific counts, such as 0-2, 2-0, 0-0, etc. I am very curious as to see the distribution of this statistic and how consistent it is with the different teams and even individual players.”

These kind of breakdowns are straightforward to produce using Retrosheet play-by-play files. Here I use data from the 2013 season to construct a table of the different batting events by the pitch count. I use this table to demonstrate that it is easier to get a baseball hit during hitter’s counts.

```pbp.file.name <-
fields.name <-
```

Since we want to restrict attention to batting plays in our breakdown, we use the `subset` function with the batting event flag equal to `TRUE`.

```pbp.bat <- subset(pbp, BAT_EVENT_FL==TRUE)
```

It is convenient to create a balls and strikes variable `The.Count` by pasting the number of balls and number of strikes variables.

```pbp.bat\$The.Count <- with(pbp.bat,
paste(BALLS_CT, "-", STRIKES_CT, sep=""))
```

Using the `recode` function in the `car` package, I recode the `EVENT_CD` variable to meaningful labels.

```library(car)
pbp.bat\$Event.Code <- recode(pbp.bat\$EVENT_CD,
"2='Out'; 3='SO'; 14='BB'; 15='IBB'; 16='HBP'; 17='Interf';
18='E'; 19='FC'; 20='1B'; 21='2B'; 22='3B'; 23='HR'")
```

I create a two-way table of counts of the plate appearance event and count.

```T.Event.Count <- with(pbp.bat, table(Event.Code, The.Count))
options(width=90)
T.Event.Count
```
```##           The.Count
## Event.Code   0-0   0-1   0-2   1-0   1-1   1-2   2-0   2-1   2-2   3-0   3-1   3-2
##     1B      4283  3603  1793  2643  3535  3239   896  2089  3158    50   774  2375
##     2B      1270   942   429   848  1037   773   327   683   858    25   279   751
##     3B       133    71    50    82    73    63    29    69    93     3    24    82
##     BB         0     0     0     0     0     0     0     0     1  2495  4133  6993
##     E        223   184   105   136   186   178    55    79   182     3    34   108
##     FC        99    64    37    56    54    42    19    29    40     2    12    25
##     HBP      279   243   175    74   147   263    27    55   195     6    14    58
##     HR       800   450   160   554   521   354   228   440   453    25   252   424
##     IBB        0     0     0     0     0     0     0     0     0   970    46     2
##     Interf     0     2     5     0     2     5     0     1     4     0     0     6
##     Out    13413 11353  5539  8362 10499  9956  3054  6068  9855   194  2565  7059
##     SO         0     0  7974     0     1 12260     0     0 10684     0     0  5791
```

Using this table, it is straightforward to compute a batting average or any other hitting measure for each of the pitch counts. Here I want to show that it is easier to hit home runs and doubles in “hitter's counts” such as 3-1 and 2-0, than “pitcher's counts” such as 0-2 and 1-2. The following function `plot.event` will compute the conditional probability of the play event for each of the possible counts, and display these probabilities using a dot chart.

```plot.event <- function(T.Event.Count, event){
P.Event <- prop.table(T.Event.Count, 2)
P <- P.Event[event, ]
D <- data.frame(Count=names(P), Probability=P)
require(ggplot2)
ggplot(D, aes(Count, Probability)) + geom_point(size=4) +
labs(title=paste("Probability of", event,
"for Each Possible Count")) +
theme(plot.title = element_text(size = rel(2))) +
theme(axis.text = element_text(size = rel(2))) +
theme(axis.title = element_text(size = rel(2)))
}
```

I illustrate using this function for the events HR, 2B, and 1B. Clearly, it is much easier to get each of these types of hits during particular pitch counts.

```plot.event(T.Event.Count, "HR")
```

```plot.event(T.Event.Count, "2B")
```

```plot.event(T.Event.Count, "1B")
```