A FanGraphs Career Trajectory Graph

Introduction

This is the third post on the general subject of building Shiny apps to display baseball career trajectories. In the first post, I discussed building an app to compare the batting trajectories of two players. In the second post, I extended this discussion to describe new apps to compare the trajectories of two contemporary pitchers or two contemporary fielders at a specific position. For all of these apps, I used season-to-season data contained in the Lahman database.

Although these Shiny apps are helpful in comparing player performance trajectories, one is limited to traditional measures of performance with the Lahman data. For example, a modern summary measure of performance for any player is WAR (wins above replacement) and it would be nice to construct career trajectories of WAR for a player of interest and WAR is not directly included in the Lahman database. So the goal of this post is to outline how one can use modern measures of performance as displayed in FanGraphs in the construction of Shiny trajectory apps. This is a good data science programming exercise since we will merge FanGraphs table data with Lahman table information and a Baseball Reference Hall of Fame list. Once one has a dataset with all of the relevant variables, it is straightforward to construct a Shiny app that allows one to compare the hitting trajectories of a group of players who satisfy particular conditions.

The FanGraphs Leaderboard Dataset

The first task is to download some FanGraphs data. Looking at the FanGraphs batting leaderboard we see the batting leaders with respect to many modern measures such as wOBA, wRC+ and WAR. If we select all seasons from 1871 to 2021, choose the “split season” option, and choose min PA = 100, then FanGraphs displays season-to-season hitting data for all players in MLB history with at least 100 PA in a season. By pushing the Export Data button, we download this table on one’s laptop as a csv file — we have a table with 41,575 rows (player/seasons) and 40 columns (variables).

For my Shiny app, I want to show trajectories for a group of batters defined by the player’s mid year, his career plate appearances and his Hall-of-Fame status. In addition, I am interested in the the player’s age for each season so one can graph a measure of hitting performance against age. So I want to add some additional variables to the FanGraphs dataset. It is important to note that the FanGraphs dataset includes a playerid variable which allows us to merge this dataset with other data sources.

Merging Lahman Data

Using the Batting dataframe from from the Lahman package, I compute the career plate appearances and mid year values for all players in MLB history. In addition, I extract players’ birth years from the People dataframe which allows the computation of age for each season and player. The challenge is to merge this Lahman information with the FanGraphs dataset.

Chadwick Baseball Bureau Register

We merge different baseball datasets by use of the players’ ids. Each MLB player has an unique id, but this id depends on the data source. For example, Mike Trout’s id is 545361 in Statcast (Baseball Savant), “troum001” in Retrosheet, “troutmi01” in Baseball-Reference (and the Lahman database), and 10155 in FanGraphs. Fortunately, the Chadwick Baseball Bureau Register provides a file people.csv that contains all of these data source ids for all professional baseball players. By merging part of this Chadwick file with the FanGraphs data, one can merge the FanGraphs data with other data sources.

Merging Baseball-Reference Hall of Fame Data

I want to be able to graph hitting trajectories for players currently in the Hall of Fame (HOF). A good listing of HOF members is provided here by Baseball-Reference. The MLB ids are not visible on the table view on the website, but when this dataset is downloaded as a csv file, one sees that the Baseball-Reference id is included as part of the Name field. By merging this data with the FanGraphs data, one includes the HOF membership information.

The Shiny App

The single dataset fgbatting_complete.csv containing the FanGraphs data with the additional variables is currently stored in the HomeRuns2021 package. The career trajectory Shiny app is included in the CareerTrajectoryGraphs package. Once the CareerTrajectoryGraphs package is installed and loaded, one runs the Shiny app by typing FanGraphsBatting() in the Console window. Here’s a snapshot of the app.

  • One first chooses a range of mid season values and minimum number of plate appearances. Here I am choosing players with mid seasons between 1980 and 1985 who had at least 9000 career PA.
  • Next, one selects a measure of interest from the drop-down menu — this menu lists many of the variables included in the FanGraphs dashboard leaderboard. Here we decide on the WAR measure.
  • Last, one decides to graph against Season or Age (here we choose Age) and whether to look at all players or only the players inducted in the HOF (here we limit our search to HOF players).
  • The graph displays smoothed fits of the career trajectories of the six selected hitters who satisfy this criteria. (We omit the actual data points so that the display is not cluttered.)

What do we learn from these smoothed career trajectories of these HOF players?

  • Mike Schmidt was clearly the leader with respect to WAR for a large part of his career.
  • Gary Carter had a high peak WAR around age 29, but had a steep rise and steep decline about that age.
  • Both Robin Yount and George Brett peaked in WAR at young ages, but Brett’s rate of decline was more gradual than that of Yount.
  • Dave Winfield and Carlton Fisk both displayed relatively flat WAR career trajectories.

One gets a different comparison of these HOF players by selecting a different FanGraphs measure. For example, below I select the Def (defensive runs saved) fielding measure. Gary Carter (a catcher) displays the highest peak defensive runs saved — Carter had his best defensive seasons between the ages of 25 and 30. Note that many of the other players (Yount, Schmidt, Brett) peaked with respect to this fielding measure in their mid 20’s.

Comments

  • Motivation for Post? FanGraphs is a great source of modern baseball statistics. I recently realized that one could download separate season statistics by use of the “Split Season” option. I illustrated using the Dashboard table here but one could easily repeat this exercise using other FanGraphs batting leader tables such as the Advanced or Batted Ball tables.
  • Setup R Code? After I downloaded the FanGraphs dataset, the following R script on my Github Gist site merges the FanGraphs, Lahman and Baseball-Reference data and adds additional variables (as described above) to create the fgbatting_complete dataset . To run this script, one initially needs to download three datafiles: the FanGraphs fgbatting.csv dataset, the Chadwick people.csv dataset, and the Baseball-Reference HOF datafile bref_hof.csv. If these three files are in the current working directory, the R script will add the new variables and write the updated file to the working directory.
  • Code for Shiny App? The file app.R in my CareerTrajectoryGraphs package contains the code for this Shiny app. It requires a dataset fg_batting that is downloaded from the HomeRuns2021 Github repository. You can run this particular Shiny app by simply downloading the app.R file in a separate folder and running this file. By the way, the function compare_plot() actually does the graphing work in this Shiny app.
  • Pitching Trajectories? After I completed this app, it was straightforward to create a similar Shiny app using FanGraphs pitching leaderboard data. If you download the package, the function FanGraphsPitching() will produce the pitching Shiny app. One can use this app to explore trajectories of a number of contemporary pitchers with respect to different measures.