I have been fascinated by players’ career batting and pitching trajectories over the years and Chapter 8 of our book talks about plotting and modeling these trajectories. This post describes a useful function to plot the career trajectory of a hitting rate for any player in MLB history.

The data frame ` Batting `

in the ` Lahman `

package contains the season batting data. There is some setup to get the data in a useful format.

- The
`Batting`

data frame has separate hitting statistics for each team for a player in a given season. I use the`summarise`

function in the new`dplyr`

to collapse over the`stint`

variable. (By the way,`dplyr`

is**much**faster than`plyr`

for this type of operation.) - Using the
`merge`

function, I add last name, first name, first year, last year, and birthyear variables to the`Batting`

data frame. - A new plate appearances variable is defined — before I do this, I convert missing values of SF and SH to zero.

Now I’m ready to write a ` plot.trajectory `

function. There will be four inputs:

- The name of the batter in quotes. (One can choose any batter in the Lahman database.)
- The numerator of the rate statistic we want to graph — for example, if we want to plot home run rates, then this numerator would be “HR”.
- The denominator of rate stat (typically “AB” or “PA”).
- In cases where there are multiple players with the same name like Ken Griffey or Tony Gwynn, the input
`num`

gives the number of the player that you are interested in. (For example, if you want to plot the career trajectory of Junior Griffey, use num = 2.)

I use the ` ggplot2 `

package to construct the plot and use a loess smoother to show the general pattern of the career trajectory.

Here is the code. First install the packages ` Lahman `

, ` dplyr `

, ` devtools `

, and ` ggplot2 `

. Then you can read in the setup code and the function by typing:

library(devtools) source_gist(9043429)

Let illustrate using this function to plot some trajectories. Mike Schmidt is one of my baseball heros — I can graph his home run trajectory by typing:

plot.trajectory("Mike Schmidt", "HR", "AB")

Clearly, Schmidt peaked in home run hitting about age 30 (that’s when the Phillies won their first World Series).

Instead suppose we look at Schmidt’s strikeout trajectory.

plot.trajectory("Mike Schmidt", "SO", "AB")

I believe Schmidt shortened his swing later in his career which led to a decrease in strikeout rates.

Ron Hunt is well-known as the “hit by pitch king”. How does Hunt’s HBP rate change over his career?

plot.trajectory("Ron Hunt", "HBP", "PA")

I believe there is some ability aspect of getting hit by a pitch (it isn’t just luck driven), and Hunt seemed to peak in this ability around age 30.

Anyway, it is fun to explore these trajectories for your favorite players. I also converted this function to a Shiny application that you can see by typing:

library(shiny) shiny::runGist('9053425')

Thanks for posting this analysis. I love analyzing baseball data with R as well. One thing I might suggest looking into is Plot.ly. It allows you to make awesome interactive visualizations. I did so with a recent post on Babe Ruth’s career numbers: http://jowanza.com/post/76499800817/babe-ruth-a-career-analysis-with-r-and-plot-ly