Dissection of the Time of a Baseball Game

In my last post, we found that the time of a baseball game is strongly related to the number of pitches, and each pitch adds, on average, 36 seconds to the length of a baseball game. Here we use PitchFX data to get a better understanding about the times between pitches in a single game. There is a nice package pitchRx , authored by Carson Sievert, that allows one to easily download PitchFX data and explore data from all pitches.

Here we load the package and use the scrapeFX function to download the pitches for all games played on September 5, 2013. The plyr::join function puts all of the pitch data in a single data frame pitches .

dat <- scrapeFX(start = "2013-09-05", end = "2013-09-05")
pitches <- plyr::join(dat$pitch, dat$atbat, 
                      by = c("num", "url"), type = "inner")

The pitchFX system records the time of each pitch which is stored in the variable sv_id . Using the substr function, we create a new variable time equal to the number of seconds past midnight.

pitches$hours <- as.numeric(substr(pitches$sv_id, 8, 9))
pitches$minutes <- as.numeric(substr(pitches$sv_id, 10, 11))
pitches$seconds <- as.numeric(substr(pitches$sv_id, 12, 13))
pitches$time <- with(pitches, 3600 * hours + 
                     60 * minutes + seconds)

Let’s look at the pitch times of the game played between Arizona and San Francisco on September 5, 2013. (See the box score for this game on Baseball-Reference.) By extracting a portion of the url variable, we create a new variable game.id and use the subset function to extract the pitches for this particular game.

pitches$game.id <- substr(pitches$url, 66, 95)
pitches1 <- subset(pitches, 
pitches1 <- pitches1[order(pitches1$time), ]

Since we are interested in the times between pitches, a new data frame time.data is created containing three variables: Time , the time between consecutive pitches, Index , the number of the pitch, and the Inning when the pitch occurred.

time.data <- data.frame(Time=diff(pitches1$time), 
                        Index=1:(length(pitches1$time) - 1), 

The ggplot2 package is used to graph the time between pitch against the pitch number. (We give many illustrations of the ggplot2 package in our book.) In the graph, the plotting symbol is the inning number, and we add horizontal lines at 1, 2, and 3 minutes to make it easier to read the vertical scale.

ggplot(time.data, aes(Index, Time, label=Inning)) +
  geom_text(size=6, color="blue") +
  geom_hline(yintercept=60) +
  geom_hline(yintercept=120) +
  geom_hline(yintercept=180) +
  geom_text(data = NULL, x = 25, y = 65, 
            label = "1 MINUTE", size=8) +
  geom_text(data = NULL, x = 25, y = 125, 
            label = "2 MINUTES", size=8) +
  geom_text(data = NULL, x = 25, y = 185, 
            label = "3 MINUTES", size=8) +
  labs(title = "Times Between Pitches in a Baseball Game") +
  theme(plot.title = element_text(size = rel(2))) +
  theme(axis.title = element_text(size = rel(2))) +
  theme(axis.text = element_text(size = rel(2))) +
  ylab("Time (Seconds)")


What do we learn from this graph?

  • In a typical inning, the time between pitches is between 15 to 30 seconds.
  • It is pretty common for the time between pitches to fall between 30 and 60 seconds. This could be due to balls in play, a pickoff move, time outs, and other factors. It would be interesting to relate the times with the actual plays as recorded in Baseball-Reference.
  • There are a number of significant breaks, between 2 1/2 and 3 1/2 minutes. Many of these are simply the breaks between half-innings — for example, the one 1 symbol, the two 2’s, and the two 3’s are just the inning breaks. Some of the long breaks that one sees towards the end of the game likely correspond to pitching changes.
  • It is pretty clear that the game slows down towards the end, judging by the large number of long breaks in the 8th and 9th innings.

This is an illustration of the time breakdown for a typical MLB in 2013 which lasted 3 hours and 11 minutes. By looking at this time data over many games, I think one would get a better understanding about the time patterns of long games and that might help MLB devise ways to make the games shorter.


7 responses

  1. Great post:
    Is there a way to scrape the data for the entire season for one team, instead of doing it by date? For example, the Red Sox are notorious for having long games, would it be scape the data for the entire season of Red Sox games and look at the pitchers that take the longest or shortest amount of time.

    1. Sean, I don’t have a lot of experience scraping a lot of pitch data with PitchRx. I believe it is set up to search by date, but I would not think it would be hard to adjust the code to search these webpages by team.

    2. Hi Sean,
      Currently, there is no simple way to scrape data on a team basis with pitchRx. This was a conscious decision as I think it is best practice to scrape all available data (within a time frame) at once, then store it for later use. That way you won’t have to track what you’ve scraped and/or scrape data every time you want to analyze something. Here is how you could obtain all 2012 data, then subset to Boston Red Sox games. Unfortunately, it does take about an hour to scrape a years worth of data…

      dat <- scrapeFX(start = "2012-01-01", end = "2012-12-31")
      #after this is done, you could write.csv(dat$atbat, file="atbats.csv");write.csv(dat$pitch, file="pitches.csv")
      #better yet you could save these tables in a database as I show on my demo page http://cpsievert.github.io/pitchRx/demo/
      bos.index <- grep("bos", dat$atbat$url)
      bos.atbats <- dat$atbat[bos.index,]
      bos.data <- plyr::join(dat$pitch, bos.atbats, by = c("num", "url"), type = "inner")

  2. Hi,
    Thanks for the response, I really appreciate the example that you provided and will be playing around with it. It’s a big help. Thanks again

    1. Sean,

      pitchRx version 1.0 allows you to collect data on a specific team (rather than a window of dates). Here is one way you could collect all Red Sox data for 2013:

      bosID <- gids[grepl("2013", gids) & grepl("bos", gids)]
      dat <- scrape(game.ids=bosID)

  3. […] Australian Open and then explore pitcher pace data using the Pitch F/X data. (A couple of years ago I dissected the total length of a baseball game using similar […]

  4. […] here I will explore the 2017 WS game lengths.  (I did a previous study on a baseball game times a few years back.) .  I won’t completely answer these questions, but hopefully provide some […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: