Monthly Archives: December, 2013

Graphically compare pitchers to contemporaries

Last time we provided code to display seasonal ERA for a pitcher compared to that of his contemporaries.
In this post we will turn that code into a function, so that by simply passing the pitcher name to the function, the plot will be displayed.
While we are at it, let’s make some improvements to the code and the resulting plot:

  • We allow for stats different than ERA to be chosen for the comparison
  • We add a legend as suggested in the final paragraph of the previous post.

Before writing the function, let’s load the relevant data (the code is the same as one week ago).

options(stringsAsFactors=F)
#set working dir
setwd("your/directory/containing/Lahman/DB")

#read data
master = read.csv("Master.csv")
pitching = read.csv("Pitching.csv")

And now, on with the function, code first, explanations later.

hofChart = function(pitcher, stat){
  require(doBy)
  require(ggplot2)
  
  # season totals by pitcher
  pitching = summaryBy(ER + IPouts + SO + BB ~ playerID + yearID
                        , data=pitching, FUN=sum, keep.names=T)

  # calculate stats (you can add your own too, e.g.: HR/9, FIP, ...)
  pitching$ERA = pitching$ER * 27 / pitching$IPouts
  pitching$K9 = pitching$SO * 27 / pitching$IPouts
  pitching$W9 = pitching$BB * 27 / pitching$IPouts
  
  # get selected pitcher's data
  pitID = subset(master, paste(nameFirst, nameLast)==pitcher)$playerID
  pitData = subset(pitching, playerID==pitID)
  
  # get contemporaries of selected pitcher (qualifying only)
  contemporaries = subset(pitching
                           , yearID >= min(pitData$yearID) 
                           & yearID <= max(pitData$yearID)
                           & IPouts >= 162*3)

  # compare stat with contemporaries
  ggplot(data=pitData, aes_string(x="factor(yearID)", y=stat)) +
    geom_boxplot(data=contemporaries, aes_string(x="factor(yearID)", y=stat)) +
    geom_point(data=contemporaries, aes_string(x="factor(yearID)", y=stat, col="'oth'", shape="'oth'", size="'oth'"), position=position_jitter(width = 0.15), alpha=.6) +
    geom_point(aes(col="sel", shape="sel", size="sel")) +
    xlab("season") +
    ggtitle(paste(pitcher, " vs his contemporaries (", stat, ")", sep="")) +
    scale_color_manual(values=c("oth"="black", "sel"="blue")
                       , labels=c("oth"="contemporaries", "sel"=pitcher)
                       , name="") +
    scale_shape_manual(values=c("oth"=1, "sel"=19)
                       , labels=c("oth"="contemporaries", "sel"=pitcher)
                       , name="") +
    scale_size_manual(values=c("oth"=2, "sel"=5)
                      , labels=c("oth"="contemporaries", "sel"=pitcher)
                      , name="")
}

Let’s see changes and tweaks since the previous incarnation of the code.

First we have added code for computing a couple of stats other than ERA, namely strikeouts per nine (K9) and walks per nine (W9). Just add formulas there for other stats you want to visualize (HR per nine, FIP, …).

Then we have changed some of the aes calls to aes_string inside the code for building the ggplot.
The difference between aes and aes_string is that the former requires expressions as arguments for the aesthetics, while the latter accepts strings. The advantage of using aes_string inside a function is that it allows to easily pass aesthetics as arguments of the function.
Thus, in our case, we can use stat as a function argument, to which we pass (as a character string) the pitching stat we’d like to have visualized.

Finally we have added code for generating a legend. This has been achieved by adding color, shape and size aesthetics to the geom_point calls. For more detailed explanation on this, look at the ggplot2 tips post.
Note that, by setting the same name and the same set of values for the three scale_..._manual calls, a single legend is added to the plot.

Now, simply call the hofChart function passing the pitcher and the stat of your choice and… voilà!

hofChart("Roger Clemens", "ERA")
Roger Clemens compared to his contemporaries (ERA) - click for full size.

Roger Clemens compared to his contemporaries (ERA) – click for full size.

hofChart("Roger Clemens", "K9")
Roger Clemens compared to his contemporaries (K/9) - click for full size

Roger Clemens compared to his contemporaries (K/9) – click for full size

hofChart("Roger Clemens", "W9")
Roger Clemens compared to his contemporaries (W/9) - click for full size

Roger Clemens compared to his contemporaries (W/9) – click for full size

The Rocket’s ERA was among the best 25% in several seasons, and he recorded a couple of exceptional seasons at the tail end of his career.
Except for his final season, he was among the elite pitchers at striking out opponents. On the other hand, despite posting good numbers in some years, he was not one of those pitchers who earned his money by avoiding walks.
His profile is definitely one of a power pitcher, a Hall-of-Fame-bound one if not for some extracurricular activities that tainted his legacy.

hofChart("Greg Maddux", "ERA")
Greg Maddux compared to his contemporaries (ERA) - click for full size

Greg Maddux compared to his contemporaries (ERA) – click for full size

hofChart("Greg Maddux", "K9")
Greg Maddux compared to his contemporaries (K/9) - click for full size

Greg Maddux compared to his contemporaries (K/9) – click for full size

hofChart("Greg Maddux", "W9")
Greg Maddux compared to his contemporaries (W/9) - click for full size

Greg Maddux compared to his contemporaries (W/9) – click for full size

And here’s another way to get strong credentials for a place in Cooperstown. The Mad Dog had an exceptional run of ridicolously low ERA from 1992 to 2002. But while Clemens built his success racking up strikeouts, Maddux was rarely better than average in that regard: he instead was uncanny at avoiding bases on balls year after year until his retirement.