Monthly Archives: February, 2014

Plotting Career Trajectories

I have been fascinated by players’ career batting and pitching trajectories over the years and Chapter 8 of our book talks about plotting and modeling these trajectories. This post describes a useful function to plot the career trajectory of a hitting rate for any player in MLB history.

The data frame Batting in the Lahman package contains the season batting data. There is some setup to get the data in a useful format.

  • The Batting data frame has separate hitting statistics for each team for a player in a given season. I use the summarise function in the new dplyr to collapse over the stint variable. (By the way, dplyr is much faster than plyr for this type of operation.)
  • Using the merge function, I add last name, first name, first year, last year, and birthyear variables to the Batting data frame.
  • A new plate appearances variable is defined — before I do this, I convert missing values of SF and SH to zero.

Now I’m ready to write a plot.trajectory function. There will be four inputs:

  • The name of the batter in quotes. (One can choose any batter in the Lahman database.)
  • The numerator of the rate statistic we want to graph — for example, if we want to plot home run rates, then this numerator would be “HR”.
  • The denominator of rate stat (typically “AB” or “PA”).
  • In cases where there are multiple players with the same name like Ken Griffey or Tony Gwynn, the input num gives the number of the player that you are interested in. (For example, if you want to plot the career trajectory of Junior Griffey, use num = 2.)

I use the ggplot2 package to construct the plot and use a loess smoother to show the general pattern of the career trajectory.

Here is the code. First install the packages Lahman , dplyr , devtools , and ggplot2 . Then you can read in the setup code and the function by typing:

library(devtools)
source_gist(9043429)

Let illustrate using this function to plot some trajectories. Mike Schmidt is one of my baseball heros — I can graph his home run trajectory by typing:

plot.trajectory("Mike Schmidt", "HR", "AB")

traj1
Clearly, Schmidt peaked in home run hitting about age 30 (that’s when the Phillies won their first World Series).

Instead suppose we look at Schmidt’s strikeout trajectory.

plot.trajectory("Mike Schmidt", "SO", "AB")

traj2
I believe Schmidt shortened his swing later in his career which led to a decrease in strikeout rates.

Ron Hunt is well-known as the “hit by pitch king”. How does Hunt’s HBP rate change over his career?

plot.trajectory("Ron Hunt", "HBP", "PA")

traj3
I believe there is some ability aspect of getting hit by a pitch (it isn’t just luck driven), and Hunt seemed to peak in this ability around age 30.

Anyway, it is fun to explore these trajectories for your favorite players. I also converted this function to a Shiny application that you can see by typing:

library(shiny)
shiny::runGist('9053425')