Obtaining Exit Velocity and Distance of Batted Balls

I was recently pointed to the location of exit velocity and batted ball distance data (thanks Daren!). This post shows you how to acquire that data and integrate it with PITCHf/x data.

Unfortunately, this data exists in a very different location from most data that feeds MLB’s Gameday app. It also requires a different game identifier from the one I’ve discussed in the past). Fortunately, these “game_pk” ids are relatively simple to obtain. Here is one way to obtain them (as well as the corresponding gameday identifiers which we’ll need later) for every game played on May 10th, 2015 using XML2R:

u <- "http://gd2.mlb.com/components/game/mlb/year_2015/month_05/day_10/miniscoreboard.xml"
obs <- XML2Obs(u)
gms <- obs[grepl("^games//game$", names(obs))]
ids <- collapse_obs(gms)[, c("game_pk", "gameday_link")]
ids <- data.frame(ids, stringsAsFactors = FALSE)
   game_pk               gameday_link
1   414110 2015_05_10_balmlb_nyamlb_1
2   414111 2015_05_10_bosmlb_tormlb_1
3   414118 2015_05_10_minmlb_clemlb_1
4   414123 2015_05_10_texmlb_tbamlb_1
5   414109 2015_05_10_atlmlb_wasmlb_1
6   414119 2015_05_10_nynmlb_phimlb_1
7   414122 2015_05_10_slnmlb_pitmlb_1
8   414112 2015_05_10_chnmlb_milmlb_1
9   414113 2015_05_10_cinmlb_chamlb_1
10  414114 2015_05_10_houmlb_anamlb_1
11  414117 2015_05_10_miamlb_sfnmlb_1
12  414116 2015_05_10_lanmlb_colmlb_1
13  414120 2015_05_10_oakmlb_seamlb_1
14  414121 2015_05_10_sdnmlb_arimlb_1
15  414115 2015_05_10_kcamlb_detmlb_1

If you have a PITCHf/x database, you could grab all your game_pks with the following query:

db <- src_sqlite("pitchRx.sqlite3")
ids <- db %>% tbl("game") %>% 
  select(game_pk, gameday_link) %>% collect()

Now that we have some game_pk ids, we’ll need the grab_bb() function that I wrote to obtain these velocities/distances (which I’ve stored in this gist):


The grab_bb() function takes a single game_pk value and returns a data frame of velocities/distances for each at-bat in that game. For that reason, we can leverage plyr‘s ldply() function to grab data for every game and place it all into one big data frame.

bbs <- plyr::ldply(ids[, "game_pk"], grab_bb)
  exit distance game_pk num
1   NA       NA  414110   1
2   90       NA  414110   2
3   90      265  414110   3
4  103       NA  414110   4
5   NA       NA  414110   5
6   NA       NA  414110   6

Next, let’s grab PITCHf/x data for each one of these games:

ids$gameday_link <- paste0("gid_", ids$gameday_link)
dat <- pitchRx::scrape(game.ids = ids$gameday_link)

In order to merge the velocities/distances with the dat$atbat data frame that pitchRx gives us, we have to add a “gameday_link” column which serves as a link between the two tables.

bbs <- plyr::join(bbs, ids, by = "game_pk")
  exit distance game_pk num               gameday_link
1   NA       NA  414110   1 2015_05_10_balmlb_nyamlb_1
2   90       NA  414110   2 2015_05_10_balmlb_nyamlb_1
3   90      265  414110   3 2015_05_10_balmlb_nyamlb_1
4  103       NA  414110   4 2015_05_10_balmlb_nyamlb_1
5   NA       NA  414110   5 2015_05_10_balmlb_nyamlb_1
6   NA       NA  414110   6 2015_05_10_balmlb_nyamlb_1

Finally, we’re ready to append the velocity/distance data to the at bat level data from pitchRx:

dat$atbat <- plyr::join(dat$atbat, bbs, by = c("num", "gameday_link"))

2 responses

  1. I am getting an error when trying to run this:
    Error: lexical error: invalid char in json text.
    (right here) ——^
    Thoughts? Thanks!

  2. It appears they’ve moved/removed the API location. Hopefully Daren knows more about it — https://twitter.com/darenw/status/615928648676216832

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: