Monthly Archives: January, 2024

Constructing a Batting Profile Display

By Jim Albert on January 26, 2024 | Leave a comment

Introduction

Recently on Twitter I noticed a nice graphical display (shown below) by Kyle Brand @Blandalytics of Ron Acuña Jr.’s batting profile that plots the spray angle against the launch angle for all balls put into play by Acuña for the 2023 season. Note that this profile is comparative to the rest of MLB. The red areas correspond to regions that are more likely for Acuña compared to MLB and the blue areas are regions where Acuña is less likely. (A live version of this app is available at https://batted-ball-charts.streamlit.app/). The basic message from this graph is that Acuña is more likely (than MLB averages) to hit ground balls and line drives to both pull and opposite sides of the field and he is less likely to hit fly balls.

This is an attractive and useful graphic showing Acuña batted ball tendencies. The labels showing the types of batted balls and directions are clear and it is easy to interpret the red and blue regions.

But this graphic raises several concerns.

(Real versus Chance Effects.) Generally, there are a limited number of batted balls for a player in one season and there will be few batted balls in particular regions of the (spray angle, launch angle) space. So I wonder if many of the wiggly patterns in this display are real, or instead reflect chance variation that can dominate for small samples.
(Sizes of the More Often/Less Often Effects.) This graphic doesn’t provide any measurements on the size of the “more often” or “less often” regions. This would be helpful if one wanted to compare the batted profiles of two batters.

This post will illustrate an alternative way of visualizing a player’s batted ball profile. My method is similar to the method of looking at batted ball rates and home run rates described in Chapter 13 of the 3rd edition of Analyzing Baseball Data with R.

The Data

From the Statcast 2023 dataset, we collect the variables batter, launch_angle, hc_x, hc_y, stand for all balls put into play. From the variables hc_x, hc_y, we compute the spray angle (in degrees). We define an adjusted spray angle to be the negative of the spray angle if the batting side is left. A negative value of adjusted spray angle corresponds to a ball that is hit in the pull direction, a positive value of adjusted spray angle is a ball hit to the opposite side. (This definition is consistent with the direction of pull and opposite sides in Brand’s graphic.)

A Scatterplot

We begin by constructing a scatterplot of the (adjusted spray angle, launch angle) pairs for a specific hitter. Below we display a scatterplot of the batting profile for the 2023 Freddie Freeman. We overlay a 5 by 5 grid defined over the region of adjusted spray angle in (-50, 50) and the launch angle in (-50, 70). This grid defines a set of 25 bins over this launch variable space.

Binning

We count the number of balls in play in each of the 5 x 5 = 25 bins — here is a display of the bin counts.

Comparing Rates

As in Kyle Brand’s graphic, we wish to compare the batted ball rates in these bins with the corresponding batted ball rates for the 2023 hitters. A good way of comparing two rates is by computing the difference of the corresponding logit rates. If $P_F$ represents the bin rate for Freeman and $P_{MLB}$ represents the bin rate for MLB hitters, then we look at the difference in logit rates.

$D = logit(P_F) - logit(P_{MLB})$

where the logit of a rate $P = \log P - \log (1 - P)$

If compute the difference in logits for all 25 subregions, I get the following display. The positive differences are the those regions where Freeman is more likely to hit compared to MLB, and the negative differences where Freeman is less likely to hit. We see that Freeman is more likely to hit balls where the launch angle is between 0 and 40 degrees and the size of the advantage is between 0.1 and 0.5 on the difference in logits scale. It is rare for Freeman to hit fly balls with a launch angle greater than 50 degrees. In the center and pull directions, Freeman is 1.2 less likely to hit these batted balls on the logit scale.

A Tile Graph

One can visualize these difference in logit values by use of a tile display where the color corresponds to the difference value. Red values are the likely regions and blue are the unlikely regions (compared to the MLB average). The takeaway is that Freeman tends to hit a lot of line drives to both the pull and opposite sides of the field.

A Shiny App

I wrote a Shiny app BattingProfile() to display these graphs for any batter in the 2023 season. One selects the player and the type of graph (among scatterplot, bin counts, difference in logits, tile graph) to display. Since it is unclear about the optimal number of bins, one can choose between 4, 5, 6, 8, and 10 bins on each scale. Here’s a snapshot of the app for Ronald Acuna using a tile display which you can compare with Brand’s graphic.

In my experience, it seems best to choose a small number of bins. If you choose, say 10 bins, then the individual bin counts will be small and it will be harder to see general patterns in the plots.

Comments

Why is this graph useful? I think a team would be interested in a player’s batting tendencies, that is, his pull/opposite side and launch angle patterns. I think a batting lineup should consist of different types of hitters including those who tend to hit line drives (like Luis Arráez) and others (I’m thinking Kyle Schwarber) who tend to hit flyballs and home runs.
Try out the Shiny app. This app is included as part of the ShinyBaseball package. You can download this app here — it is self-contained as the data is read from one of my Github repositories. One can run this app by use of the runApp() function in RStudio.
Comparing home run rates. As I mentioned before, this “difference in logits” methodology for comparing rates can be found also in the home run chapter of our book that can be found here.
Improvements to the display? Certainly I can add some labels, such as the ones used by Brand to make it easier to read and understand. Also we could look at players for a number of seasons to look for consistency or changes of their batting profile over time. In addition, we can add launch speed to our data and look for associations between all three launch variables.

Added on January 28

Tom Tango also reacted to Brand’s graphic on his blog post. He suggested that I add an additional graph to my Shiny app. Following Tom’s suggestion, it is of interest to look at the difference between the player’s bin count $y$ and the count that one would expect $E(y)$ based on the MLB average. Here’s a graph displaying these differences. We see for example that Freeman has hit 28 more balls in one particular line-drive bin than one would expect based on the MLB average.

To see if these differences are significant, one could look at the Pearson residuals

$Z = \frac{y - E(y)}{\sqrt{E(y)}}$

Here’s a display of these Pearson residuals. Several of these residuals exceed 3 for opposite-side line drive batted balls, indicating that Freeman is indeed unusual in hitting a high number of line drives.

The reader is welcome to play with the live version of this Shiny app at https://bayesball.shinyapps.io/BattingProfile/

Posted in: Uncategorized

	Jim Albert on retrosheet Package and Compari…
	addisonmcg99 on retrosheet Package and Compari…
	Jim Albert on Calculation of Win Probabiliti…
	John Purlia on Calculation of Win Probabiliti…
	bbaumer21 on New Edition of Analyzing Baseb…

Exploring Baseball Data with R