An interesting question that has been explored over the years is learning when players peak. That is, at what age does a baseball player achieve peak performance? By googling this question, I found an interesting Boston Globe article.
Here is a brief study into this question, using the wOBA measure as a summary of offensive performance. This is a nice “teaching example” using R and the Lahman database.
My Basic Method
I am going to apply a relatively simple empirical method for detecting a player’s peak age. Plot a player’s season wOBA value against age, fit a loess smoothing curve (using the default smoothing parameter), and estimate the player’s peak age as the age where the smoother is at a maximum. By using the smoother rather than the actual wOBA, one is focusing on the general pattern in the trajectory and ignoring outliers.
Here are several examples of this approach.
Mickey Mantle peaked early in his career
Similarly, Albert Pujols peaked early — here the actual wOBA peak was at age 28, but the estimate based on the smooth is 27.
Mike Schmidt peaked in wOBA later in his career.
Anyway, I think this loess estimate gives a reasonable estimate at a player’s peak age.
Using this method, I perform a more comprehensive look at peak ages for players who had long careers.
- Using the Lahman database, I collected the basic stats (AB, H, 2B, etc) for all players and seasons. I added an age variable and computed wOBA for all seasons using the Fangraphs weights for wOBA. I added several variables — the number of career PA and the mid career defined as the average of the debut and final years for a player.
- I collected career data for all players with at least 5000 PA and who had completed their career before the 2016 season. There were 886 players in this group.
- For each of the 886 players, I used the loess smoother to smooth the pattern of wOBA values and estimated the peak age as the age where the smoothed wOBA was the highest.
- When I am done, I have a data frame containing the midcareer and peak age of wOBA. I am looking for general patterns of peak age and seeing if these peak ages have changed over the seasons of baseball.
Here is a graph of the distribution of peak ages. The most common ages (frequencies of players) are 29 (99 players), 26 (95 players), 28 (86 players), and 27 (81 players). Generally, 41 percent of all players peaked between ages between 26 and 29. But as the graph indicates, some players peak at young and old ages. (By the way, the conclusions are similar to those of the Boston Globe article where they used other summary measures of performance such as WAR.)
Does the peak age distribution change across era? Here I have divided the midcareer variable into four intervals — the message is that the peak age distribution hasn’t changed over years — the mean peak age was 28.5 (years 1870-1930), 27.7 (years 1930-1970), 27.9 (years 1970-1990), and 28.9 (1990-2010). There is evidence to indicate that the peak age is slightly increasing in recent years.
Comments on this Study
- This analysis is a bit simplistic since I am basing these peak age estimates on the smooths of the individual trajectories. There is no effort to combine data across players. Since the individual wOBA players are generally noisy, there is a need for some pooling of the data.
- But one should pool the data in a reasonable way. For example, one can use multilevel models where each player has unique parameters describing the career trajectories and then a distribution is used to model the distribution of these individual trajectory parameters. I’ve done this using quadratic fitting functions.
- Some people have made some (in my view) unreasonable assumptions to learn about aging. For example, it doesn’t make sense to assume that each player peaks at age 28. People have different aging patterns — this means that players have different peak ages and also that players have different paths in maturing and in declining towards retirement. So one need flexibility in any model to allow for these differences. If one makes restrictive assumptions, then you’ll get answers which are inconsistent with the data.
- One obvious bias in this study is that I only considered players with long careers and I’d suspect that players with long careers tend to peak later than players with shorter careers.
- Also it would be interesting to focus on players at different positions and to use a variety of metrics in this exploration. For example, people generally get slower with age — what measures of performance are useful in detecting slowness in baseball? Anyway, there is definitely more to say on this general problem.