Exploring Statcast Data from Baseball Savant

Generally, we are excited about the new Statcast data. Information is currently available about the exit velocity, launch angle, and distance of every batted ball and this data should be helpful in determining the optimal launch angle for hitters. If you look at Baseball Savant’s Statcast Leaderboard, you can find the 2016 leaders in measures such as the average exit velocity, the length of the longest home run etc. These summaries are interesting, but I am more interested in all Statcast measurements for a specific player instead of just the average or the maximum. I’m interested in exploring the relationship between the exit velocity, the launch angle, and the BIP result.

This Statcast data for a player is available on the same page — by clicking on a player’s name, one can see the Exit Velocity, Launch Angle, Distance, and Outcome (groundout, single, double, etc.) for all balls that are put in play. I did not see any simple way of scraping this data, but it is relatively easy to copy and paste these html tables. I collected the Statcast data for the 2015 season for eight players that I was interested in — Mike Trout, Christian Yelich, Ichiro Suzuki, David Ortiz, Josh Donaldson, Andrew McCutchen, Joey Votto, and Nelson Cruz.

Here I illustrate the use of three graphs to explore this data — this is a first look to gain some understanding about the patterns in these relationships. I’ll focus on David Ortiz 2015 stats, although comparisons between players are interesting.

A Basic Graph

I use the ggplot2 package to graph the Launch Angle (degrees) against the Exit Velocity (mph) for all of Ortiz’s balls put in play where the color of the point corresponds to the outcome.


We learn some things from this graph.

  1. Balls with a negative launch angle (that is, grounders) tend to be groundouts.

  2. The singles tend to be batted balls with launch angles between 0 and 25 degrees.

  3. The home runs (blue dots) tend to hit over 100 mpg with launch angles about 25 degrees.

Modeling the Probability of a Hit

What combinations of launch angle and bat speed result in base hits? A Generalized Additive Model is a general way of fitting a response by a smooth function of several explanatory variables. The response here is binary — either the player gets a hit (coded by 1) or an out (coded by 0). We are interested in modeling p, the probability of a hit. We represent the generalized additive model as

log (p / (1 – p)) = f(angle, velocity)

where f is a smooth function of the two variables angle and velocity. This is more general than the logistic regression model

log (p / (1 – p)) = constant + beta0 * angle + beta1 * velocity

that assumes that the logit of the probability is a linear function of the two variables without any interaction.

Anyway, this model is easy to fit in R using the function gam in the mgcv package. From the model fit, I find the probability of a hit for each data point and display the fitted probabilities by color where blue is near 0 and red is near 1. (Brian Mills illustrated the use of the function gam in smoothing PitchFX data in an earlier post on this blog.)


This graph shows that the fitted hit probabilities are close to 1 for batted balls hit hard (over 100 mph) and between launch angles between 0 and 25. For Ortiz, the hit probabilities tend to be small for grounders.

Looking Closer

To better understand the model fit, the last graph draws several line graphs. For Ortiz, his average exit velocity was 93.8 mph. I considered three different exit velocities — 10% above the average (103.2), the average (93.8), and 10% below the average (84.5) and plot the fitted probability of a hit as a function of the launch angle for each exit velocity.


I think this last graph is helpful. For the two lower exit velocity values, the probability of a hit is maximized for launch angles between 10 and 20 degrees. For the largest exit velocity, we see two maximums — one around 15 degrees, and a larger one at 25 degrees. This makes sense — if Ortiz hits the ball hard enough, then higher launch angles will result in doubles against the wall or home runs.

Looking Ahead

There seems to be active research of making sense of this type of data. For example, there is an interesting post by Rob Arthur at fivethirtyeight.com which looks at the relationship between launch angle, exit velocity and a linear weights measure of BIP value. There is a more recent article by Bill Petti on the Hardball Times. I agree with Rob Arthur that these findings are currently preliminary — we have a lot to learn about success in hitting balls in play.

Data and R Code

I created a small package mystatcast that contains the 8 Baseball Savant datasets and the single function statcast_graphs that constructs these three plots. For example, to install this package from github and produce these graphs for David Ortiz, just type the following.


I encourage the interested reader to try these out for other players. Even better, I encourage the reader to create a graph that is helpful for comparing the characteristics of different hitters.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: