In this post, we do a quick fielding exploration of second basemen in MLB for the last 50 seasons. We use Bill James’ range factor statistic. James argued that the range factor was more relevant than the fielding percentage in evaluating the quality of defensive play.

We focus on the following questions:

– How has the range factor of 2nd basemen changed over the last 40 seasons of baseball?

– What are good values of the range factor for the best-fielding second basemen and what does a fielding career trajectory look like?

All of the R code for this example can be found here. See Chapter 12 of *Analyzing Baseball with R* for a discussion of more sophisticated fielding measures.

First we load the Layman and dplyr packages.

library(Lahman) library(dplyr)

We collect fielding data for all 2nd basemen for seasons 1960 and later.

fielding.2b <- filter(Fielding, POS=="2B", yearID >= 1960)

For each season, we compute the range statistic and put the summary values in the data frame rf.season.2b.

rf.season.2b <- summarize(group_by(fielding.2b, yearID), RF.9 = 9 * (sum(PO, na.rm=TRUE) + sum(A, na.rm=TRUE)) / (sum(InnOuts / 3, na.rm=TRUE)))

We plot the range factor statistic against season and add a smoothing line.

library(ggplot2) ggplot(rf.season.2b, aes(yearID, RF.9)) + geom_point() + geom_smooth(span=.2, method="loess")

This is an interesting downward trend in the average range factor for second basemen. I wonder why?

Let’s focus on fielding data for all players since 1960 with at least 1000 career opportunities (PO + A) fielding second base.

We first collapse the data frame fielding.2b over the stint variable.

fielding.2b <- summarize(group_by(fielding.2b, playerID, yearID), PO = sum(PO), A = sum(A), InnOuts = sum(InnOuts))

We collect lifetime PO + A for all players and merge this summary data frame with the fielding data frame.

summary.2b <- summarize(group_by(fielding.2b, playerID), PO.A = sum(PO) + sum(A)) fielding.2b <- merge(fielding.2b, summary.2b, by="playerID")

We keep only the players with at least 1000 PO + A at 2nd base.

fielding.2b <- filter(fielding.2b, PO.A >= 1000)

We compute the range factor statistic for all player seasons and add this information to the fielding data frame.

fielding.2b <- mutate(fielding.2b, Range = 9 * (PO + A) / (InnOuts / 3)) fielding.2b <- merge(fielding.2b, rf.season.2b, by="yearID")

Since the average number of plays of second basemen has changed over seasons, we compute an adjusted range measure that we define by dividing the player range factor by the season average. A range factor value **large than one** indicates the player is doing better than average.

fielding.2b <- mutate(fielding.2b, Adj.Range = Range / RF.9) head(fielding.2b)

## yearID playerID PO A InnOuts PO.A Range RF.9 Adj.Range ## 1 1960 richabo01 312 337 3369 5341 5.201 5.328 0.9762 ## 2 1960 blasido01 318 329 3425 3634 5.100 5.328 0.9572 ## 3 1960 tayloto02 319 406 3750 5599 5.220 5.328 0.9797 ## 4 1960 gilliji01 51 69 703 1517 4.609 5.328 0.8650 ## 5 1960 mazerbi01 413 449 4022 8987 5.787 5.328 1.0860 ## 6 1960 breedma01 359 422 3828 1628 5.509 5.328 1.0339

Since I'm interested in viewing fielding trajectories for specific players, we write a function plot.range that will plot the career trajectory of the adjusted range factor for a particular player. A horizontal line is placed at the value one to see if the player is above or below average in the seasons of his career.

plot.range <- function(name){ N <- strsplit(name, " ")[[1]] pid <- filter(Master, nameFirst==N[1], nameLast==N[2])$playerID player.data <- filter(fielding.2b, playerID==pid) print(ggplot(player.data, aes(yearID, Adj.Range)) + geom_point(size=4, color="red") + geom_smooth(method="loess", size=2) + geom_hline(yintercept=1, size=2, color="brown") + labs(title=name) + theme(plot.title = element_text(size = rel(3)))) }

I conclude by illustrating this R function for a number of great second basemen in modern baseball. (I encourage you to try this function for other second basemen.) The patterns are interesting in several ways. First, some of these great second basemen appear to be pretty average with respect to range factor. Second, the career trajectory shapes are interesting — they don’t display the standard quadratic shape peaking in the middle of career that is common for trajectories of batting measures.

plot.range("Chase Utley")

plot.range("Joe Morgan")

plot.range("Craig Biggio")

plot.range("Manny Trillo")

plot.range("Willie Randolph")

plot.range("Bill Mazeroski")

plot.range("Ryne Sandberg")

plot.range("Roberto Alomar")

Interesting analysis. A WOW for Bill Mazeroski!