An interesting question that has been explored over the years is learning when players peak. That is, at what age does a baseball player achieve peak performance? By googling this question, I found an interesting Boston Globe article.
Here is a brief study into this question, using the wOBA measure as a summary of offensive performance. This is a nice “teaching example” using R and the Lahman database.
My Basic Method
I am going to apply a relatively simple empirical method for detecting a player’s peak age. Plot a player’s season wOBA value against age, fit a loess smoothing curve (using the default smoothing parameter), and estimate the player’s peak age as the age where the smoother is at a maximum. By using the smoother rather than the actual wOBA, one is focusing on the general pattern in the trajectory and ignoring outliers.
Here are several examples of this approach.
Mickey Mantle peaked early in his career
Similarly, Albert Pujols peaked early — here the actual wOBA peak was at age 28, but the estimate based on the smooth is 27.
Mike Schmidt peaked in wOBA later in his career.
Anyway, I think this loess estimate gives a reasonable estimate at a player’s peak age.
Using this method, I perform a more comprehensive look at peak ages for players who had long careers.
- Using the Lahman database, I collected the basic stats (AB, H, 2B, etc) for all players and seasons. I added an age variable and computed wOBA for all seasons using the Fangraphs weights for wOBA. I added several variables — the number of career PA and the mid career defined as the average of the debut and final years for a player.
- I collected career data for all players with at least 5000 PA and who had completed their career before the 2016 season. There were 886 players in this group.
- For each of the 886 players, I used the loess smoother to smooth the pattern of wOBA values and estimated the peak age as the age where the smoothed wOBA was the highest.
- When I am done, I have a data frame containing the midcareer and peak age of wOBA. I am looking for general patterns of peak age and seeing if these peak ages have changed over the seasons of baseball.
Here is a graph of the distribution of peak ages. The most common ages (frequencies of players) are 29 (99 players), 26 (95 players), 28 (86 players), and 27 (81 players). Generally, 41 percent of all players peaked between ages between 26 and 29. But as the graph indicates, some players peak at young and old ages. (By the way, the conclusions are similar to those of the Boston Globe article where they used other summary measures of performance such as WAR.)
Does the peak age distribution change across era? Here I have divided the midcareer variable into four intervals — the message is that the peak age distribution hasn’t changed over years — the mean peak age was 28.5 (years 1870-1930), 27.7 (years 1930-1970), 27.9 (years 1970-1990), and 28.9 (1990-2010). There is evidence to indicate that the peak age is slightly increasing in recent years.
Comments on this Study
- This analysis is a bit simplistic since I am basing these peak age estimates on the smooths of the individual trajectories. There is no effort to combine data across players. Since the individual wOBA players are generally noisy, there is a need for some pooling of the data.
- But one should pool the data in a reasonable way. For example, one can use multilevel models where each player has unique parameters describing the career trajectories and then a distribution is used to model the distribution of these individual trajectory parameters. I’ve done this using quadratic fitting functions.
- Some people have made some (in my view) unreasonable assumptions to learn about aging. For example, it doesn’t make sense to assume that each player peaks at age 28. People have different aging patterns — this means that players have different peak ages and also that players have different paths in maturing and in declining towards retirement. So one need flexibility in any model to allow for these differences. If one makes restrictive assumptions, then you’ll get answers which are inconsistent with the data.
- One obvious bias in this study is that I only considered players with long careers and I’d suspect that players with long careers tend to peak later than players with shorter careers.
- Also it would be interesting to focus on players at different positions and to use a variety of metrics in this exploration. For example, people generally get slower with age — what measures of performance are useful in detecting slowness in baseball? Anyway, there is definitely more to say on this general problem.
Just ran across your website while looking for baseball analysis using R. Nice! I’ll have to check out your book when I have more time.
Another way you might want to view your data is to separate it by age of first MLB appearance. Per your note about the bias due to looking at players with longer careers, generally the younger the player is when they first appear, the better the talent he has. That would help to match apples with apples, it appears to me. Even better is perhaps changing the start to first season over, say, half a season (not sure of exact standards, but, say, over 300 PA for hitters, over 40 IP or appearances for relievers, over 90 IP or over 15 starts for starting pitching).
I also had the thought of perhaps looking at the peak relative to the player’s start, and then comparing that to career length, to get an idea of roughly when a peak occurs, and then filter by age, but wasn’t sure about the value of that, so I thought I would throw it out and see what you thought of it.
But this is definitely an interesting question that has been around for a long long time, I enjoyed reading this.
Lastly, since you work with R, perhaps you know of a solution: a problem I’m having now is that my main personal computer is a Chromebook, but I much prefer using an IDE like RStudio, which is not available in the cloud nor as an extension on Chrome. I’m only aware of R-Fiddle as the only way for me to work with R on such a machine. Are you aware of another R IDE Chromebook resource, or should I start looking for a standard computer that I can download RStudio onto and use?
And I’m not that techy anymore, so a solution as I’ve seen elsewhere of setting up Linux on my Chromebook in order to run RStudio is not a solution I prefer, though if there is a good hand holding article that guides me through this, I might change my mind. Thanks.
Thanks for your comments about ways of looking at aging — I’ll have to explore those when I have time. Unfortunately, I don’t have much knowledge of working with R on a Chromebook. I think you would be very pleased with RStudio. It does have a web interface through RStudio Server, but I don’t have any experience with that.
Thanks for answering my question on Chromebook. It appears that there are some tutorials online for accessing RStudio on AWS, some sort of AMI, easily findable with a search. I’m not that experienced with using cloud, so I decided to just get a cheap PC to play with.