Exploring Types of Batted Balls

There are two distinct components of batting: being patient in looking for the right pitch and getting a quality swing at a ball that is put into play. In my last post, I explored the number of pitches in a plate appearance. Here we explore the types of batted balls, namely line drives, pop ups, fly balls, and groundballs, and the relationship of batted ball type with run value.

We begin by loading in the 2013 Retrosheet play-by-play data with run values of each play attached.

load("~/Dropbox/2014 WORK/Runs Expectancy/Final R/pbp2013.Rdata")

We focus on only the plays where the batted ball is either a flyball, groundball, line drive, or popup. (This is indicated by the variable BATTEDBALL_CD in the Retrosheet data file.) We use the factor function to define more descriptive labels to the categories.

d2013inplay <- filter(d2013,
                  BATTEDBALL_CD == "F" |
                  BATTEDBALL_CD == "G" |
                  BATTEDBALL_CD == "L" |
                  BATTEDBALL_CD == "P")
d2013inplay$BATTEDBALL_CD <- factor(d2013inplay$BATTEDBALL_CD,
                  levels=c("F", "G", "L", "P"),
                  labels=c("Flyball","Groundball",
                           "LineDrive","Popup"))

To start, it is interesting to see the hit value of each batted ball type, and also to see how many outs are generated for each type.

with(d2013inplay, table(BATTEDBALL_CD, H_FL))
##              H_FL
## BATTEDBALL_CD     0     1     2     3     4
##    Flyball    24250  1261  1782   272  3680
##    Groundball 46396 13438  1008    59     0
##    LineDrive  10114 13580  5358   437   981
##    Popup       9110   156    46     4     0

As you might expect, popups tend to produce no hits, and extra-base hits tend to be flyballs and line drives.

with(d2013inplay, table(BATTEDBALL_CD, EVENT_OUTS_CT))
##              EVENT_OUTS_CT
## BATTEDBALL_CD     0     1     2     3
##    Flyball     6994 24146   105     0
##    Groundball 15850 41285  3765     1
##    LineDrive  19988 10156   326     0
##    Popup        226  9060    30     0

We see that popups need to produce one out and groundballs (as you might expect) are more likely to result in double-plays.

Here we are interested in the value of each type of batted ball by run value. We create a new data frame which gives the mean and standard deviation of runs for each type.

library(dplyr)
S <- summarize(group_by(d2013inplay, BATTEDBALL_CD),
               Mean.Runs = mean(RUNS.VALUE), 
               SD.Runs = sd(RUNS.VALUE))
S
## Source: local data frame [4 x 3]
## 
##   BATTEDBALL_CD   Mean.Runs   SD.Runs
## 1       Flyball  0.05760307 0.6179356
## 2    Groundball -0.07110042 0.4065463
## 3     LineDrive  0.30423694 0.5475748
## 4         Popup -0.24214596 0.2054216

As we might expect, flyballs and line drives, on average, result in positive run values, and groundballs and popups, on average, have negative run values. The standard deviation of the run values for flyballs is high since these are the batted balls that can result in home runs. In contrast, popups tend to be negative and the run values have a smaller standard deviation.

Histograms visually show us the distribution of run values of each type of batted ball. I’ve added a vertical line at the value of zero so we can easily see if the run values are positive or negative.

library(ggplot2)
ggplot(d2013inplay, aes(RUNS.VALUE)) + 
  geom_histogram(aes(y = ..density..)) +
  facet_wrap(~ BATTEDBALL_CD, ncol=2) + 
  geom_vline(xintercept = 0, color="red", size=2) +
  theme(strip.text = element_text(size = rel(2)))

bip1

One can characterize each player by his proportions of batted balls of each type. We focus on the players in the 2013 season with at least 300 batted balls.

B <- summarize(group_by(d2013inplay, BAT_ID),
               N = length(BATTEDBALL_CD),
               p.LineDrive = mean(BATTEDBALL_CD=="LineDrive"),
               p.PopUp = mean(BATTEDBALL_CD=="Popup"),
               p.FlyBall = mean(BATTEDBALL_CD=="Flyball"),
               p.GroundBall = mean(BATTEDBALL_CD=="Groundball"),
               mean.Runs = mean(RUNS.VALUE))
B.300 <- filter(B, N >= 300)

We explore the relationship of the proportion of flyballs and the proportion of popups with mean run value.

ggplot(B.300, aes(p.FlyBall, mean.Runs)) +
  geom_point() + geom_smooth()

bip2

ggplot(B.300, aes(p.GroundBall, mean.Runs)) +
  geom_point() + geom_smooth()

bip3

As one might expect, players with higher flyball percentages tend to have higher mean run value, and players with higher groundball percentages tend to have smaller mean run value.

Suppose we divide these hitters in two groups – those with low and high mean run values. We graph players’ flyball and groundball proportions where the color of the point corresponds to the run value. The better hitters (with respect to run value) tend to be in the lower-half of the plot corresponding to high flyball percentages and low groundball percentage

B.300 <- mutate(B.300, 
      Sign.Runs=ifelse(mean.Runs > median(mean.Runs), 
                       "High Runs", "Low Runs"))
ggplot(B.300, aes(p.FlyBall, p.GroundBall, color=Sign.Runs)) +
  geom_point()

bip4

To go further, suppose we fit a regression model of the form

mean.Runs = b0 + p.FlyBall * b1 + p.GroundBall * b2

We use the lm function to fit this model and display the estimated regression coefficients.

library(MASS)
fit <- lm(mean.Runs ~ p.FlyBall + p.GroundBall, 
          data=B.300)
fit
## 
## Call:
## lm(formula = mean.Runs ~ p.FlyBall + p.GroundBall, data = B.300)
## 
## Coefficients:
##  (Intercept)     p.FlyBall  p.GroundBall  
##      0.04121       0.25188      -0.12527

To interpret this fit, we put lines of constant fit on the scatterplot. We set up a grid of (p.GroundBall, p.FlyBall) values and use the predict function to find the predicted values of mean.Runs on this grid.

pF <- seq(0.1, 0.4, length=20)
pG <- seq(0.3, 0.65, length=20)
d <- data.frame(p.GroundBall=c(outer(rep(1, 20), pG)),
                p.FlyBall=c(outer(pF, rep(1, 20))))
d$Runs <- predict(fit, d)
ggplot(B.300, aes(p.FlyBall, p.GroundBall)) +
  geom_point() +
  stat_contour(data=d, binwidth=0.02,
  aes(x=p.FlyBall, y=p.GroundBall, z=Runs, color = ..level..)) +
  scale_colour_gradient(low = "red", high = "blue")

bip5

The run values of the contour lines range from 0 to 0.10, corresponding to the colors red to blue. A batter with a flyball percentage of 10% and a groundball percentage of 60% is predicted to have a run value close to 0. In contrast, a batter with a flyball percentage of 40% and a groundball percentage of 30% is predicted to have a run value around 0.10.

I’m sure a lot more can be said about players with different in-play batting types. Here are some possible explorations.

  • I suspect that a team’s lineup needs to have a variety of hitters of different batted balls types — is that true for current teams?
  • I would think that a player’s tendency to hit flyballs or hit groundballs tends to persist during his career. Is it true? I would think it would make more sense to talk about a hitter’s tendency to hit flyballs or groundballs than talking about hitting “for average” or a hitter’s ability to hit in clutch situations.
  • Is a batter’s batted ball type predictive of future offensive performance?
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: