Statcast Data and HIP Rates

HIP Rates

I have been curious about the high variability in hit-in-play (HIP) rates and looking for possible explanations for this variability.  Below I have graphed the strikeout (K) rates against the HIP (or BABIP) rates for all players at the All-Star Break.


The variation in HIP rates is remarkable.  Turner, Avila, and Judge have HIP rates over .400; in contrast Chirinos, Schwarber, and Schimpf have values under .200.

In the media, people imply there is a lot of chance variability involved in HIP rates and indicate that a high HIP rate (like that for Aaron Judge) is inflated and will typically decrease when you have complete season data.  But I thought it would be a useful exercise to use Statcast data to help distinguish the balls in play hitting of these six hitters.

Baseball Savant

I am getting more experience with the Statcast Search page available from Baseball Savant.   Honestly, this page looks pretty daunting at first sight since it has a large number of menus.  Here I want to get hit in play data for the 2017 season for these six players, three who have high HIP rates and three that have low rates.

In the main Statcast Search page, I indicate that the Player Type is “Batter”, I want to include all four Batted Ball Types, and I input “Aaron Judge” in the Players menu.


When I hit the Search button, I see a single line of aggregate data for Judge.  But if I press the Graphs option on the right, I see links for an assortment of charts and a “Download as CSV” link which is what I’m interested in.  When I select that option, a csv file is downloaded that contains data for each of Judge’s batted balls.

babip0.pngI repeat this procedure for the other five players — I get a total of six files that are easy to read into R and merge into a single data frame.


There are many variables in this dataset but I focus on the launch speed and launch angle variables.  For all balls put in play (excluding home runs), I graph the launch angle against the launch speed, coloring the point by the type of batted ball.


It is interesting that the four batted ball types correspond clearly to values of launch angle — groundballs are balls hit at a launch angle 10 degrees or smaller, line drives are hit at launch angles 10-25 degrees, fly balls are hit 25-60 degrees, and popups have the highest launch angles.  (It looks like some popups hit at smaller speeds have smaller launch ages.)

Next. we redraw this figure but use panels to separate out the plots by player — the high HIP players are in the first row and the low HIP players are in the 2nd row.


This graph tells us that these players have different patterns of hitting batted balls.  The high HIP guys are hitting a lot of line drives; in contrast the low HIP guys are hitting a higher fraction of balls that are not line drives.  (Look, for example, how few line drives are hit by Ryan Schimpf, and contrast that with Justin Turner who hits a high fraction of line drives.)

How is launch speed and launch angle related to hits and outs?  I redraw this figure but now use Hit/Out as the color.  The locations of hits (the blue points) seem similar to the locations of line drives in the scatterplots.


Since type of batted ball seems strongly associated with hits/outs, I’ll explore this further.  In this graph, I display the batting average on balls-in-play for each hit type.


Note that the AVG on line drives is high and the AVG on popups is very small.  I was surprised how high the AVG on line drives could be — Judge has a AVG of .800 on line drives.  In contrast, Ryan Schimpf’s AVG on line drives is only .250 (this could be a small sample size effect).

The high AVG on line drives could be due to the launch speed.  Below I construct parallel boxplots of the launch speeds for our six hitters.  The highest averages on launch speed come from Judge, Avila, and Schwarber.  By the way, Schwarber’s low HIP rate is likely caused by a low fraction of line drives.


Takeaways and Further Explorations

Okay, what have we learned?

  • Hit types (groundballs, line drives, flyballs, and popups) are distinguished mainly by launch angle.
  • There can be big differences in a “batter’s profile” (proportions of HIPs in the four types) and that appears to explain a good amount of variation in the HIP rates.  Judge and Turner hit a high fraction of line drives and Schwarber is struggling to hit line drives.

To check this last point, I collected batted ball data from FanGraphs,  Here is a plot of the LD fraction against the HIP rate for all players which confirms this relationship. (Some of the extreme points don’t agree with my earlier graphs due to the different definitions of qualifying hitters.)


  • The batting average on line drives put in play is pretty high.
  • Perhaps Aaron Judge’s success in getting hits on balls in play is partially due to speed off the bat.

Here are some suggestions for explorations for the interested reader:

  1. Look at a larger collection of Statcast data to better understand the variability in HIP rates and see how they are related to launch speeds and launch angles.
  2. How well can one predict the probability of a hit based solely on the batted ball type?
  3. What hitters have the highest fraction of line drives among balls put in play?
  4. Teams perform defense shifts with the understanding that hitters have particular tendencies to hit balls in different directions.  Can one explain some of the variation in HIP rates by the direction of the batted ball?  (I did not use the spray angle variable in this exploration.)