Range Factors of 2nd Basemen

In this post, we do a quick fielding exploration of second basemen in MLB for the last 50 seasons. We use Bill James’ range factor statistic. James argued that the range factor was more relevant than the fielding percentage in evaluating the quality of defensive play.

We focus on the following questions:

– How has the range factor of 2nd basemen changed over the last 40 seasons of baseball?

– What are good values of the range factor for the best-fielding second basemen and what does a fielding career trajectory look like?

All of the R code for this example can be found here. See Chapter 12 of Analyzing Baseball with R for a discussion of more sophisticated fielding measures.

First we load the Layman and dplyr packages.

library(Lahman)
library(dplyr)

We collect fielding data for all 2nd basemen for seasons 1960 and later.

fielding.2b <- filter(Fielding, POS=="2B", yearID >= 1960)

For each season, we compute the range statistic and put the summary values in the data frame rf.season.2b.

rf.season.2b <- summarize(group_by(fielding.2b, yearID),
              RF.9 = 9 * (sum(PO, na.rm=TRUE) + 
                          sum(A, na.rm=TRUE)) / 
                    (sum(InnOuts / 3, na.rm=TRUE)))

We plot the range factor statistic against season and add a smoothing line.

library(ggplot2)
ggplot(rf.season.2b, aes(yearID, RF.9)) + geom_point() + 
  geom_smooth(span=.2, method="loess")

rf.history

This is an interesting downward trend in the average range factor for second basemen. I wonder why?

Let’s focus on fielding data for all players since 1960 with at least 1000 career opportunities (PO + A) fielding second base.

We first collapse the data frame fielding.2b over the stint variable.

fielding.2b <- summarize(group_by(fielding.2b, playerID, yearID),
                   PO = sum(PO), A = sum(A),
                   InnOuts = sum(InnOuts))

We collect lifetime PO + A for all players and merge this summary data frame with the fielding data frame.

summary.2b <- summarize(group_by(fielding.2b, playerID),
                   PO.A = sum(PO) + sum(A))
fielding.2b <- merge(fielding.2b, summary.2b, 
                       by="playerID")

We keep only the players with at least 1000 PO + A at 2nd base.

fielding.2b <- filter(fielding.2b, PO.A >= 1000)

We compute the range factor statistic for all player seasons and add this information to the fielding data frame.

fielding.2b <- mutate(fielding.2b, 
                Range = 9 * (PO + A) / (InnOuts / 3))
fielding.2b <- merge(fielding.2b, rf.season.2b, by="yearID")

Since the average number of plays of second basemen has changed over seasons, we compute an adjusted range measure that we define by dividing the player range factor by the season average. A range factor value large than one indicates the player is doing better than average.

fielding.2b <- mutate(fielding.2b, 
                Adj.Range = Range / RF.9)
head(fielding.2b)
##   yearID  playerID  PO   A InnOuts PO.A Range  RF.9 Adj.Range
## 1   1960 richabo01 312 337    3369 5341 5.201 5.328    0.9762
## 2   1960 blasido01 318 329    3425 3634 5.100 5.328    0.9572
## 3   1960 tayloto02 319 406    3750 5599 5.220 5.328    0.9797
## 4   1960 gilliji01  51  69     703 1517 4.609 5.328    0.8650
## 5   1960 mazerbi01 413 449    4022 8987 5.787 5.328    1.0860
## 6   1960 breedma01 359 422    3828 1628 5.509 5.328    1.0339

Since I'm interested in viewing fielding trajectories for specific players, we write a function plot.range that will plot the career trajectory of the adjusted range factor for a particular player. A horizontal line is placed at the value one to see if the player is above or below average in the seasons of his career.

plot.range <- function(name){
  N <- strsplit(name, " ")[[1]]
  pid <- filter(Master, 
              nameFirst==N[1], 
              nameLast==N[2])$playerID
  player.data <- filter(fielding.2b, playerID==pid)
  print(ggplot(player.data, aes(yearID, Adj.Range)) + 
     geom_point(size=4, color="red") + 
      geom_smooth(method="loess", size=2) +
     geom_hline(yintercept=1, size=2, color="brown") +
    labs(title=name) + 
   theme(plot.title = element_text(size = rel(3))))
}

I conclude by illustrating this R function for a number of great second basemen in modern baseball. (I encourage you to try this function for other second basemen.) The patterns are interesting in several ways. First, some of these great second basemen appear to be pretty average with respect to range factor. Second, the career trajectory shapes are interesting — they don’t display the standard quadratic shape peaking in the middle of career that is common for trajectories of batting measures.

plot.range("Chase Utley")

chase.fielding

plot.range("Joe Morgan")

joe.fielding

plot.range("Craig Biggio")

biggio.fielding

plot.range("Manny Trillo")

trillo.fielding

plot.range("Willie Randolph")

<randolph.fielding

plot.range("Bill Mazeroski")

maz.fielding

plot.range("Ryne Sandberg")

ryne.fielding

plot.range("Roberto Alomar")

alomar.fielding

Advertisements

One response

  1. Interesting analysis. A WOW for Bill Mazeroski!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: