Comparing Home Run Trajectories

In the last post, I illustrated plotting a career trajectory of a batting rate of any player in MLB history. Let’s consider a current problem where looking at trajectories may be helpful. The 2014 Phillies are hopeful that Ryan Howard will overcome his last two seasons with injuries and get back to hitting a lot of home runs. Given that Howard is 34 years old, is it reasonable to believe that he’ll come back?

After I did some set up work, I illustrated last week using a R function plot.trajectory that will display the trajectory of a player’s rate statistic. In the R script displayed here, I modify the function so it can (using the plot=FALSE argument) output a data frame with the player’s name, age, and rate for all seasons.

Let’s compare the trajectory of Ryan Howard’s home run rates with the trajectories of ten similar players. Baseball-reference gives a list of ten players who are most similar (using Bill James’ similarity score) to Howard through age 33. I use the rbind function to merge vertically the data frames of the trajectory data for these 11 players. (I add the 2013 data to Howard since the Lahman package only goes through the 2012 season.)

d <- NULL
names <- c("Ryan Howard", "Richie Sexson",
           "Cecil Fielder", "Mo Vaughn", "Mark McGwire", 
           "Norm Cash", "Jay Buhner", "Willie Stargell",
           "Jason Giambi", "Frank Howard", "David Justice")
for (j in 1:11)
  d <- rbind(d, plot.trajectory(names[j], "HR", plot=FALSE))
d <- rbind(d,  data.frame(Player="Ryan Howard",
                      Age=33,
                      Rate=11/286))

Using the ggplot2 package, it is easy to compare the home run rate trajectories of these 11 players using separate panels on the same scale. Smoothing (loess) curves are added to these plots to see the basic trajectory patterns.

library(ggplot2)
ggplot(d, aes(Age, Rate)) + 
  geom_point(size=3, color="red") + 
  geom_smooth(method="loess", size=1.5) + 
  facet_wrap(~ Player, ncol=4) +
  ylab("HOME RUN RATE") + xlab("AGE") +
  theme(strip.text  = element_text(size = rel(2)))

howardplot

All of the R code for this example can be found here. One can run this example (assuming packages devtools, Lahman, dplyr, and ggplot2 are installed) by typing

library(devtools)
source_gist(9220828)

Looking at Ryan Howard’s home run rate trajectory (upper left), clearly his rate has been in free fall since his peak at age 26 — part of this decline is attributed to injuries. How does he compare with other “similar” hitters?

  • Ryan’s decline is similar to that of Richie Sexson, Cecil Fielder, and Jay Buhner who all experienced significant declines in their 30’s.
  • Some hitters like Norm Cash and David Justice were relatively stable in their home run rates in their 30’s.
  • Some players like Willie Stargell and Jason Giambi had long careers with gradual declines in their home rate rates near the end of their careers.
  • Only one player, Mark McGwire, shows a boost in home rate in one’s 30’s, but I think we suspect the reason for this boost.

From looking at these trajectories, I doubt that Ryan Howard will hit home run at a much higher rate than, say 0.05 (five percent). Actually, I think that the Phillies would be thrilled if Ryan could hit 30-35 home runs in 2014.

Advertisements

3 responses

  1. Hello! I’m trying replicate this home run trajectories for the top 12 Braves home run hitters of all time (sorry, big Braves fan). How would I go about only including the seasons when the players actually played for the Braves (Bos, Mil, and/or Atl). For instance, Andruw Jones played w/ Atlanta through age 30, but the visualization goes through age 35. http://jamesclarence.files.wordpress.com/2014/03/braveshrtraj2.png

    I’m a novice when it comes to programming and R, so manipulating the data to specify team is a little difficult. Thanks, Jay

  2. In the Lahman package, the Batting data frame contains the variable teamID which is a three-character abbreviation for the team name. If you only want data for Atlanta players, say, then you would use a subset function:
    braves <- subset(Batting, teamID == "ATL")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: