There is an interesting new paper by Franks et al on devising useful sports metrics. One good property of a sports metric mentioned in the paper is the notion of discrimination — does the metric reliably differentiate between players of different abilities?
As in Franks et al, one can define discrimination by means of a random effects model. We’ll describe this model in terms of pitching, although the concept can be generalized to other baseball measures and to sports besides baseball.
Suppose we look at all pitchers in the 2016 season who have pitched at 100 innings. For each pitcher and each inning, we record the number of walks and hits allowed. (The mean number of walks and hits allowed is the popular WHIP measure.)
Let y_ij denote the number of walks and hits allowed by the ith pitcher in the jth inning. We assume that y_ij is normally distributed with mean M_i and standard deviation sigma. (We can view M_i as the talent of the ith pitcher). If we have N pitchers, we assume that the talents M_1, …, M_N come from a normal talent distribution with mean mu and standard deviation tau.
We estimate this random effects model from the data — we get estimates at sigma and tau. If we have a pitcher with n innings, then the discrimination D is defined to be
D = tau ^ 2 / (tau ^ 2 + sigma ^2 / n)
where we plug in estimates for tau and sigma. D represents the proportion of variance of the number of hits and walks per inning for these N pitchers that is explained by the variation between the talents of the pitchers.
A couple of comments about this measure of discrimination.
- As we let n, the number of observed innings for pitcher, get really large, then D will approach 1 which means that the observed variability in WHIP between the pitchers is entirely explained by the differences in the pitchers’ talents.
- But in a season, n won’t be that large, and D will typically take on values between .3 and .8
- We described this measure for hits and walks in an inning, but we can count other events, and each event will have an associated measure of D. Values of D close to 1 correspond to measures that really distinguish pitcher abilities. In contrast, values of D close to 0 provide little information about pitcher abilities.
I assumed that n = 150 (we have pitchers with 150 innings pitched), and counted different things: (1) the number of hits and walks, (2) the the number of strikeouts, (3) the number of hits, (4) the number of walks, (5) the number of groundouts, (6) the number of flyouts, (7) the number of forceouts, (8) the number of extra-base hits, (9) the number of sac flies, and (10) the number of double players. For each measure, I computed the value of D, and the values are displayed below.
Some comments on this graph.
- Some of these high discrimination values make sense since well-know FIP (fielding independent pitching) measures like strikeouts and walks have known to have high discriminations. Also one would expect measures like sac flies and double plays to have low discriminations since these events are really functions of the runners-on-base rather than the quality of the pitcher.
- WHIP (walks and hits) has a moderate value of discrimination among all of these outcomes.
- I am a bit surprised that the number of groundouts and the number of flyouts have high values of D. Also I am surprised that the number of home runs per inning has a low discrimination value — this means that much of the variability in home runs per inning is not attributed to differences in pitchers’ abilities to allow home runs.
Anyway, these calculations are based on a convenient normal-normal random effects model which could be improved. But I think the notion of discrimination is helpful — especially when one is trying to sort out a variety of measures that can be useful predictions of future performance. Certainly, a front-office person would be more confident in the talent of a pitcher with a high strikeout rate than with a pitcher with a high double-play rate.