Response to comments and home run update
Introduction
I’ve been away at an oversees meeting, so I haven’t been active on this blog recently. I’ll use this post to respond to several questions from readers and then give an update on the 2018 home run hitting at this halfway point in the season.
Adding an Age Variable
Zach writes:
I’m trying to add an “Age” column in the Lahman batting.csv file. My idea is that I can use a combination of getinfo and the sapply function. I’m comfortable using the getinfo function for individual players. I’ve attempted to adapt the function to do this but I’m struggling. Any suggestions?
I do talk about adding an Age variable to the Batting data frame in our baseball/R book. Here’s a R function get.stats
using the tidyverse
collection of packages.
A player’s age for a season is defined to his age on July 1. The Master
data frame in the Lahman package has a variable birthYear
. I define a new variable birthyear
that is equal to birthYear + 1
if the player’s birthMonth
is 7 or later, otherwise birthyear
is equal to birthYear
. Then you define a player’s age
as the difference between the yearID
and birthyear
. This function also computes some traditional measures of performance for each age.
get.stats % filter(playerID == player.id) %>% inner_join(select(Master, playerID, birthMonth, birthYear), by="playerID") %>% mutate(birthyear = ifelse(birthMonth >= 7, birthYear + 1, birthYear), Age = yearID - birthyear, SLG = (H - X2B - X3B - HR + 2 * X2B + 3 * X3B + 4 * HR) / AB, OBP = (H + BB + HBP) / (AB + BB + HBP + SF), OPS = SLG + OBP) %>% select(Age, SLG, OBP, OPS) }
I illustrate this function for Tony Gwynn whose playerID
is “gwynnto01”.
TG <- get.stats("gwynnto01") head(TG) Age SLG OBP OPS 1 22 0.3894737 0.3365854 0.7260591 2 23 0.3717105 0.3545455 0.7262560 3 24 0.4438944 0.4095665 0.8534609 4 25 0.4083601 0.3641791 0.7725392 5 26 0.4672897 0.3805436 0.8478334 6 27 0.5110357 0.4469027 0.9579383
Many seasons of Retrosheet data
Another reader asks: “what happens if I want to download more than one season at a time?”
I have described how to download Retrosheet play-by-play data for a single season in a previous post. Although this post was written four years ago, it seems to work fine. Retrosheet does allow you to download multiple seasons at once. I’d suggest to look at the code of the function parse.retrosheet2.pbp.R
— I think a straightforward modification of this function will work for multiple seasons.
Home Run Update Through Games of June 30
2018 home run hitting is still trailing the home run pattern of 2017. Here is a graph of the cumulative in-play home run rate for games played through June 30. It is interesting that the 2017 HR rate climbed during June; in 2018 the HR rate appears to have leveled off in recent weeks around 4.4 percent.
Below I plot the actual in-play home run rate for each week of the 2017 and 2018 seasons. In the last three weeks (week numbers 20, 21, 22), the 2018 rate has been significantly lower than the 2017 rate.
Further work will be needed to find the characteristics that are leading to this drop in home run hitting this season. In some ways, the variability of home run hitting in recent seasons has been a mystery.
Recent Comments