There are two distinct components of batting: being patient in looking for the right pitch and getting a quality swing at a ball that is put into play. In my last post, I explored the number of pitches in a plate appearance. Here we explore the types of batted balls, namely line drives, pop ups, fly balls, and groundballs, and the relationship of batted ball type with run value.

We begin by loading in the 2013 Retrosheet play-by-play data with run values of each play attached.

load("~/Dropbox/2014 WORK/Runs Expectancy/Final R/pbp2013.Rdata")

We focus on only the plays where the batted ball is either a flyball, groundball, line drive, or popup. (This is indicated by the variable ` BATTEDBALL_CD `

in the Retrosheet data file.) We use the `factor`

function to define more descriptive labels to the categories.

d2013inplay <- filter(d2013, BATTEDBALL_CD == "F" | BATTEDBALL_CD == "G" | BATTEDBALL_CD == "L" | BATTEDBALL_CD == "P") d2013inplay$BATTEDBALL_CD <- factor(d2013inplay$BATTEDBALL_CD, levels=c("F", "G", "L", "P"), labels=c("Flyball","Groundball", "LineDrive","Popup"))

To start, it is interesting to see the hit value of each batted ball type, and also to see how many outs are generated for each type.

with(d2013inplay, table(BATTEDBALL_CD, H_FL))

## H_FL ## BATTEDBALL_CD 0 1 2 3 4 ## Flyball 24250 1261 1782 272 3680 ## Groundball 46396 13438 1008 59 0 ## LineDrive 10114 13580 5358 437 981 ## Popup 9110 156 46 4 0

As you might expect, popups tend to produce no hits, and extra-base hits tend to be flyballs and line drives.

with(d2013inplay, table(BATTEDBALL_CD, EVENT_OUTS_CT))

## EVENT_OUTS_CT ## BATTEDBALL_CD 0 1 2 3 ## Flyball 6994 24146 105 0 ## Groundball 15850 41285 3765 1 ## LineDrive 19988 10156 326 0 ## Popup 226 9060 30 0

We see that popups need to produce one out and groundballs (as you might expect) are more likely to result in double-plays.

Here we are interested in the value of each type of batted ball by run value. We create a new data frame which gives the mean and standard deviation of runs for each type.

library(dplyr) S <- summarize(group_by(d2013inplay, BATTEDBALL_CD), Mean.Runs = mean(RUNS.VALUE), SD.Runs = sd(RUNS.VALUE)) S

## Source: local data frame [4 x 3] ## ## BATTEDBALL_CD Mean.Runs SD.Runs ## 1 Flyball 0.05760307 0.6179356 ## 2 Groundball -0.07110042 0.4065463 ## 3 LineDrive 0.30423694 0.5475748 ## 4 Popup -0.24214596 0.2054216

As we might expect, flyballs and line drives, on average, result in positive run values, and groundballs and popups, on average, have negative run values. The standard deviation of the run values for flyballs is high since these are the batted balls that can result in home runs. In contrast, popups tend to be negative and the run values have a smaller standard deviation.

Histograms visually show us the distribution of run values of each type of batted ball. I’ve added a vertical line at the value of zero so we can easily see if the run values are positive or negative.

library(ggplot2) ggplot(d2013inplay, aes(RUNS.VALUE)) + geom_histogram(aes(y = ..density..)) + facet_wrap(~ BATTEDBALL_CD, ncol=2) + geom_vline(xintercept = 0, color="red", size=2) + theme(strip.text = element_text(size = rel(2)))

One can characterize each player by his proportions of batted balls of each type. We focus on the players in the 2013 season with at least 300 batted balls.

B <- summarize(group_by(d2013inplay, BAT_ID), N = length(BATTEDBALL_CD), p.LineDrive = mean(BATTEDBALL_CD=="LineDrive"), p.PopUp = mean(BATTEDBALL_CD=="Popup"), p.FlyBall = mean(BATTEDBALL_CD=="Flyball"), p.GroundBall = mean(BATTEDBALL_CD=="Groundball"), mean.Runs = mean(RUNS.VALUE)) B.300 <- filter(B, N >= 300)

We explore the relationship of the proportion of flyballs and the proportion of popups with mean run value.

ggplot(B.300, aes(p.FlyBall, mean.Runs)) + geom_point() + geom_smooth()

ggplot(B.300, aes(p.GroundBall, mean.Runs)) + geom_point() + geom_smooth()

As one might expect, players with higher flyball percentages tend to have higher mean run value, and players with higher groundball percentages tend to have smaller mean run value.

Suppose we divide these hitters in two groups – those with low and high mean run values. We graph players’ flyball and groundball proportions where the color of the point corresponds to the run value. The better hitters (with respect to run value) tend to be in the lower-half of the plot corresponding to high flyball percentages and low groundball percentage

B.300 <- mutate(B.300, Sign.Runs=ifelse(mean.Runs > median(mean.Runs), "High Runs", "Low Runs")) ggplot(B.300, aes(p.FlyBall, p.GroundBall, color=Sign.Runs)) + geom_point()

To go further, suppose we fit a regression model of the form

mean.Runs = b0 + p.FlyBall * b1 + p.GroundBall * b2

We use the ` lm `

function to fit this model and display the estimated regression coefficients.

library(MASS) fit <- lm(mean.Runs ~ p.FlyBall + p.GroundBall, data=B.300) fit

## ## Call: ## lm(formula = mean.Runs ~ p.FlyBall + p.GroundBall, data = B.300) ## ## Coefficients: ## (Intercept) p.FlyBall p.GroundBall ## 0.04121 0.25188 -0.12527

To interpret this fit, we put lines of constant fit on the scatterplot. We set up a grid of `(p.GroundBall, p.FlyBall)`

values and use the ` predict `

function to find the predicted values of ` mean.Runs `

on this grid.

pF <- seq(0.1, 0.4, length=20) pG <- seq(0.3, 0.65, length=20) d <- data.frame(p.GroundBall=c(outer(rep(1, 20), pG)), p.FlyBall=c(outer(pF, rep(1, 20)))) d$Runs <- predict(fit, d) ggplot(B.300, aes(p.FlyBall, p.GroundBall)) + geom_point() + stat_contour(data=d, binwidth=0.02, aes(x=p.FlyBall, y=p.GroundBall, z=Runs, color = ..level..)) + scale_colour_gradient(low = "red", high = "blue")

The run values of the contour lines range from 0 to 0.10, corresponding to the colors red to blue. A batter with a flyball percentage of 10% and a groundball percentage of 60% is predicted to have a run value close to 0. In contrast, a batter with a flyball percentage of 40% and a groundball percentage of 30% is predicted to have a run value around 0.10.

I’m sure a lot more can be said about players with different in-play batting types. Here are some possible explorations.

- I suspect that a team’s lineup needs to have a variety of hitters of different batted balls types — is that true for current teams?
- I would think that a player’s tendency to hit flyballs or hit groundballs tends to persist during his career. Is it true? I would think it would make more sense to talk about a hitter’s tendency to hit flyballs or groundballs than talking about hitting “for average” or a hitter’s ability to hit in clutch situations.
- Is a batter’s batted ball type predictive of future offensive performance?