# Platoon Splits — What is the Platoon Skill Variation?

In an earlier post, I examined pitcher and batting side changes in baseball, but I didn’t say anything about platoon advantages. I am working on a survey paper on situational effects in baseball, and I was rereading Chapter 6 of The Book by Tango, Lichtman, and Dolphin. I was focusing on the following table (reproduced from Table 66 on page 156):
This particular chapter of The Book considers the meaning of platoon effects. We know that players generally have a higher weighted on-base percentages (wOBA) when facing a pitcher from the opposite side. But these platoon splits are hard to interpret due to the small sample sizes. The authors are primarily interested in the spread not in the observed platoon splits, but rather the spread in the players’ platoon talents. In this post, I will give an overview of how one can measure the size of the platoon skill variation using R and the Retrosheet play-by-play data for the 2014 season.

### Collecting the Data

From Retrosheet, I collect the play-by-play data for the 2014 season — for each plate appearance, we have the batter id, the side of the pitcher, and the play result. The Master table in the Lahman database gives the batting side of each hitter. Here I focus on right-handed batters who have at least 150 PA against pitchers of each arm.

### Platoon Effects for All Players

For each player, I fit a simple regression model (using the  lm  function) to estimate the platoon effect and get a corresponding standard error. The model has the form
$y = \alpha + \beta OPP$
where $y$ is the weighted batting measure for a PA, $OPP$ indicates the pitching side (1 if opposite and 0 if the same side), and $\beta$ is the platoon talent of the player. From the output of  lm , we get the platoon estimate $\hat \beta$ and a standard error $s$

### A Random Effects Model

After fitting the regression model for all players, we have regression estimates $\hat \beta_1, ..., \hat \beta_N$ with corresponding standard errors $s_1, ..., s_N$. We assume $\hat \beta_j$ is distributed normal with mean $\beta_j$ and standard deviation $s_j$.

Now we assume that the platoon talents $\beta_1, ..., \beta_N$ come from a normal “talent curve” with mean $\mu$ and standard deviation $\tau$. Here $\mu$ represents the average platoon split and $\tau$ measures the platoon skill variation. (For the Bayesian readers, I place a uniform prior on $(\mu, \tau)$ to complete the model.) I call this a normal-normal model, since we’re assuming the observed platoon effects are normally distributed, and also the platoon skills come from a normal curve.

### Fitting the Model for Right-Handed Hitters

This normal-normal random effects model is easily fit using my  LearnBayes  package using the  laplace  and normnormexch  functions. When I fit this model to the 2014 right-handed batters, I get the following estimates: $\hat\mu = 0.032, \hat\tau = 0.015$. So, on average the wOBA platoon effect for right-handed hitters is 32 points and the spread of the platoon talents is 15 points. (Tango et al call these the “average wOBA platoon split” and the “platoon skill variation”.) Note that my estimates differ from those in the table in The Book, but they are using different data.

### Plotting the Observed Platoon Splits and the Estimated Platoon Talents

Once we fit this random effects model, we can get improved estimates at each player’s platoon ability. Essentially, this estimate is a weighted average of his actual platoon split and the average split (here 0.032), where the weights are proportional to the inverse of the sampling variance ($1 / s_j^2$) and the inverse of the talent variance ($1/\tau^2$). Here the talent spread is much smaller than the sampling standard errors, so the platoon estimates are shrunk strongly towards the average split. (By the way, this formula for the estimate is discussed in the “Regression to the Mean” section of the appendix of The Book.)

Here is an illustration of some of the results. The table gives the platoon effect, the corresponding standard error, and the improved estimate for the six right-handed hitters with the most extreme positive splits during the 2014 season. We see Rajai Davis hit 130 points better (on the wOBA scale) against lefties, but our estimate of the platoon effect is only about 41 points.

   nameLast     Effect         SE   Estimate
1     Davis 0.13034376 0.04822778 0.04064569
2     Jones 0.12474634 0.05027777 0.03950820
3 Donaldson 0.10474850 0.04867146 0.03818826
4    Altuve 0.10042365 0.04519534 0.03868753
5  Holliday 0.09939045 0.04769119 0.03793527
6    Butler 0.07688487 0.04569260 0.03617009


Here is a graph of the platoon effects for all 2014 hitters — the black dots are the observed splits and the red dots are the improved estimates at the platoon talents. Note that some of the observed splits are negative, but all of the platoon talent estimates are positive — we don’t believe that any player really has a negative platoon ability.

Page 156 of The Book says that “platoon skills vary from player to player”. Yeah, but the platoon skill variation is very small compared with the variability of the observed platoon splits for a single season. Actually, assuming that the players don’t have any differences in platoon skills would not be a bad model for these data.

Although I have not discussed the R code, all of my code for this example can be found on my gist site.