Plotting Career Trajectories

I have been fascinated by players’ career batting and pitching trajectories over the years and Chapter 8 of our book talks about plotting and modeling these trajectories. This post describes a useful function to plot the career trajectory of a hitting rate for any player in MLB history.

The data frame Batting in the Lahman package contains the season batting data. There is some setup to get the data in a useful format.

  • The Batting data frame has separate hitting statistics for each team for a player in a given season. I use the summarise function in the new dplyr to collapse over the stint variable. (By the way, dplyr is much faster than plyr for this type of operation.)
  • Using the merge function, I add last name, first name, first year, last year, and birthyear variables to the Batting data frame.
  • A new plate appearances variable is defined — before I do this, I convert missing values of SF and SH to zero.

Now I’m ready to write a plot.trajectory function. There will be four inputs:

  • The name of the batter in quotes. (One can choose any batter in the Lahman database.)
  • The numerator of the rate statistic we want to graph — for example, if we want to plot home run rates, then this numerator would be “HR”.
  • The denominator of rate stat (typically “AB” or “PA”).
  • In cases where there are multiple players with the same name like Ken Griffey or Tony Gwynn, the input num gives the number of the player that you are interested in. (For example, if you want to plot the career trajectory of Junior Griffey, use num = 2.)

I use the ggplot2 package to construct the plot and use a loess smoother to show the general pattern of the career trajectory.

Here is the code. First install the packages Lahman , dplyr , devtools , and ggplot2 . Then you can read in the setup code and the function by typing:

library(devtools)
source_gist(9043429)

Let illustrate using this function to plot some trajectories. Mike Schmidt is one of my baseball heros — I can graph his home run trajectory by typing:

plot.trajectory("Mike Schmidt", "HR", "AB")

traj1
Clearly, Schmidt peaked in home run hitting about age 30 (that’s when the Phillies won their first World Series).

Instead suppose we look at Schmidt’s strikeout trajectory.

plot.trajectory("Mike Schmidt", "SO", "AB")

traj2
I believe Schmidt shortened his swing later in his career which led to a decrease in strikeout rates.

Ron Hunt is well-known as the “hit by pitch king”. How does Hunt’s HBP rate change over his career?

plot.trajectory("Ron Hunt", "HBP", "PA")

traj3
I believe there is some ability aspect of getting hit by a pitch (it isn’t just luck driven), and Hunt seemed to peak in this ability around age 30.

Anyway, it is fun to explore these trajectories for your favorite players. I also converted this function to a Shiny application that you can see by typing:

library(shiny)
shiny::runGist('9053425')
Advertisements

One response

  1. Thanks for posting this analysis. I love analyzing baseball data with R as well. One thing I might suggest looking into is Plot.ly. It allows you to make awesome interactive visualizations. I did so with a recent post on Babe Ruth’s career numbers: http://jowanza.com/post/76499800817/babe-ruth-a-career-analysis-with-r-and-plot-ly

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: