In the last post, I showed how to use the broom
package to collect results of individual player regressions. Specifically, I collected quadratic fits to home run rates of players who have hit 500 career home runs, and estimated the ages where the players achieved peak performance.
There was one issue that I ignored — these estimates of peak performance using individual player data can be pretty poor. I hid this problem in the earlier post by presenting only the “reasonable” estimates that fell between 20 and 40 years.
Here I demonstrate this problem, and show how one can correct this problem by use of a simple multilevel model.
Individual Estimates of Peak Age and Standard Errors
When one uses the linear model lm
function to fit a quadratic model, one can gets an estimate at the regression coefficients and from this one can get estimate at the peak age. But this function does not automatically give you a standard error of this estimate. One can get this standard deviation by
 Simulating from the (posterior) distribution of the regression vector.

For each simulated value of (beta0, beta1, beta2), compute the peak age PEAK = – beta1 / 2 / beta2.

Compute the mean and standard deviation of the simulated peak ages — the mean of the simulated values is an estimate at PEAK and the standard deviation of the simulated values provides the standard error of the estimate.
When I did this for our 27 sluggers, here are some of the peak age estimates and standard errors:
Name M S Estimate 1 Hank Aaron 32.92271 20.2008382 29.27845 2 Ernie Banks 30.66459 118.1309213 29.26111 3 Barry Bonds 46.79100 278.8562239 29.26135 4 Jimmie Foxx 27.14421 0.4030601 27.30595 5 Ken Griffey 28.73868 0.5773422 28.81446 6 Reggie Jackson 30.94798 13.2598651 29.27955
For some players like Foxx and Griffey, we have relatively precise estimates of peak age — the associated standard errors are small. In contrast, the estimates of Banks’ peak age of 30.6 and Bonds’ peak age of 46.8 are poor since the standard errors are so large.
Multilevel Modeling
Here is a basic multilevel model for combining estimates. We have the peak age estimates {} with associated standard errors {}. We assume that is normally distributed with mean (we call this the true peak age) and standard deviation . We assume the true peak ages are normal with mean and standard deviation , and we assign a flat prior. This model can be fit easily using the LearnBayes
package. From this model fit, we obtain estimates of the peak ages. These estimates adjust the relatively poor individual estimates by moving them towards an overall estimate.
The Multilevel Estimates
Using ggplot2
graphics , I graphically compare the individual estimates of peak age with the multilevel estimates below.
A couple of comments from looking at this graph.
 The individual estimates are shrunk strongly towards the average value. The individual estimates range between 15 and 63 — the multilevel estimates of peak age are between 27.3 and 31.3. This is reasonable since we were not confident of some of the individual estimates (recall my comment about the high standard errors).

The degree of shrinkage for a particular player depends on the associated standard error. The individual estimate of 63 is shrunk almost all the way towards the average of 29.3 since we had little confidence of the individual estimate.

My conclusion is that peak ages of players vary, but generally they fall in a narrow band from 27 to 31.
If you want to read more about this type of multilevel modeling, look at my shrinking batting averages post.