In last week’s post, I illustrated the use of the brushing capability of Shiny to construct an app for computing in-play batting averages over regions of the zone. This gave me an opportunity to describe the basic elements of a Shiny app. This work motivated me to write several apps to visualize batting and pitching measures over brushed regions of the zone. So I’ll use this post to demonstrate the features of these apps. As you’ll see, all of the R code is available so I hope it will motivate the reader to develop better versions of these apps.
Distributing Shiny Apps
When one writes a Shiny app, you want others to use it. One way to distribute the app is to make the code available to users. I’ve put my recent Shiny apps in a new ShinyBaseball package available on my Github page. There are instructions on how to install the package together with other required packages such as
ggplot2. The batting data for the 2019 season is included as part of the package. So, for example, to run the
BrushingZone batting app, you just type
If you explore the code, you’ll see the Shiny code in separate folders in the inst folder in my ShinyBaseball page.
Another way to distribute the Shiny app is to upload the app on a web server. Currently, there is a Shiny hosting service offered by RStudio and one can upload a small number of apps (for free) using the basic membership plan. So you are welcome to try out my
BrushingZone app by visiting https://bayesball.shinyapps.io/BrushingZone/ — you should see the following (2019 Mike Trout data):
The BrushingZone app
The objective of the
BrushingZone app is to compute various in-play batting measurements over selected regions. How do you use this app?
- You choose a player of interest (this app allows one to enter lowercase names such as “bryce harper”).
- You select a measure of interest. If you choose Launch Speed or Expected BA, you will see a scatterplot of the locations of the in-play results where the color is coded by the value of the continuous measurement (orange is “hot”, blue is “cold”). If you choose Hit or Home Run, you will see the pitch locations of the hits and home runs, respectively.
- By clicking on any point, you’ll see the launch speed and expected BA for that pitch that was put into play. For example, I located the particular Mike Trout home run that was lowest in the zone — this pitch was hit quite hard — the launch speed was 110.8 mph with an expected BA of 0.979. (This particular home run that was a “no-doubter”.)
- For any display, you can brush using a rectangle tool, and you’ll see stats for points in the brushed region. For example, I’m checking out Trout’s performance for pitches low in the zone. Seeing the 0.541 hit rate and the 0.176 home run rate, Trout appears to like to hit low pitches.
The PitchOutcome app
After I finished the batting app, I wanted to construct a similar app from the pitcher perspective. The notion of a pitching “outcome” is not clear, so I’m considering different types of pitching outcomes.
- Called Pitches. Consider the pitches that are not swung at (the “called” pitches) — the outcome is either a “Called Strike” or a “Called Ball”.
- Swung Pitches. Consider the pitches where the batter swings — the outcomes are “Miss”, “Foul”, or “In-Play”.
- In-Play. Consider the pitches where the ball is put into play — two possible outcomes are “Hit” or “Out”.
I have a Shiny app called
PitchOutcome that is able to show pitch outcomes for these different groups of pitches. The
ShinyBaseball package includes this app together with all of the 2019 pitching data. Once the package is installed, you run this app by typing
The 2019 Aaron Nola
PitchOutcome(), we enter the name of a 2019 pitcher in the box. We illustrate by entering in “Aaron Nola” (remember I am a Phillies fan). Immediately we see Nola’s pitch type distribution – he tends to throw an equal number of four-seamers (FF) and knuckle-curves (KC), followed by change-ups (CH) and two-seamers (FT). The app shows the location of all of his pitches where the color corresponds to strike (S), ball (B), or in-play (X). (This is the
type variable in the Statcast database.)
Let’s focus on the locations of Nola’s signature pitch – the knuckle curve – by selecting KC from the Pitch Type palette. Notice the interesting shape of the locations of points from upper left to lower right – this is a common shape for off-speed pitches of a right-handed pitcher.
By choosing “Called” from the Pitches to Display palette, we see the locations of the called pitches for the knuckle curves. I am a little surprised on the high number of curve balls in the zone. Nola doesn’t appear to waste his curve balls — he often throws them for strikes.
By selecting “Swung”, we see the location of Nola’s knuckle curves where the hitter swung. The color corresponds to the swung outcome – either foul, in-play or miss.
Since I notice the high number of blue points outside of the zone, I use the brushing tool to highlight a region primarily out of the zone. There were 266 swings in this region and 46% of the pitches were swinging strikes. This high value seems pretty impressive, although it would be interesting to compare this with the miss rate for other pitchers.
Let’s contrast this “out of the zone” swing performance with the swing performance in a rectangle in the lower region of the zone. There were 228 swings in this region and only 14% were missed. Note the importance of the location of Nola’s knuckle curve. (By the way, the phrase “hanging curve ball” refers to a poorly placed curve ball landing in the zone.)
We continue to look at knuckle-curves but now we are focusing on the pitches that were put into play. We can measure the batter success several ways – we can compute the hit rate or the expected batting average (which takes into account the launch angle and launch velocity measurements). We see that there were 49 out of the zone curve balls put into play and the batters had a low 0.184 BA on these pitches.
Let’s contrast this with the in-play performance on curve balls in the lower region of the zone. There were 88 pitches put into play in this region and the batters had a more robust 0.386 batting average on these pitches.
- Try these apps out! Try out the batting app that is live on the Shiny server. Better yet, install my ShinyBaseball package and run both the batting and pitching apps. Even better, develop a Shiny app that improves over my creations. If I was still teaching, I would likely have a Shiny assignment for my data science students.
- All MLB teams should be using these type of visualizations. Baseball folks tend to focus on data tables such as the numerous ones shown on Baseball-Reference and FanGraphs. But many of these measures are better understood in the context of the zone and so some visualization really adds insight to these measures.
- What did we learn about Aaron Nola? We learned several things by just exploring the locations of his knuckle curves. He appears able to throw his knuckle curve for a called strike and his low ones (especially the ones outside of the zone) appear effective in swinging strikes. This is much more informative that reading a table several summary measures of Nola’s pitching performance.
- Why did I write a Shiny package? Shiny apps are good ways to test functions with many different arguments. A Shiny app is a great way to communicate baseball research findings to a general audience.
- Are these hard to code? At first, I admit that it takes some time to get familiar with the general structure of a Shiny app. But the more I play with these Shiny apps, one understands the basic code structure and that encourages me to try new things. When I think of the development time, the biggest time element is in deciding the format of the Shiny environment and relatively little time is spent on the actual coding.