In last week’s post, I was interested in measuring a hitter’s ability to produce a hit from a batted ball. Specifically, I constructed a Z score that measures one’s ability to hit beyond what would be predicted based on knowing the launch angle and exit velocity. One issue that I didn’t discuss much was the basic prediction problem — what is a reasonable prediction of hit or out based on knowing the launch angle and exit velocity? Here I will talk about different types of predictions, measuring rates of correct predictions, and looking at the correct prediction rate at a player level.
When one uses the generalized additive model, one can use the launch angle and exit velocity to obtain a predicted probability that the batted ball is a base hit. If the probability is high enough, we predict a hit — in the previous post I suggested (with no justification) using a .5 cutoff, so if the probability exceeds .5, we predict “hit”. But other cutoff values are possible, say .331, .2, or any value between 0 and 1. As an extreme case, consider a rule that ignores exit velocity and launch angle. If we simply predict “out” for any batted ball (ignoring launch angle and exit velocity), then we we would always be right if the batted ball was out. On the other hand, if the batted ball was a hit, then we’d be always wrong. Since 33.1 percent of all batted balls are hits and 66.9 % are out, the overall rate of correct predictions using this silly rule would be
P(silly rule is correct) = P(out) P(correct | out) + P(hit) P(correct | hit) = 0.669 x 1 + 0.331 x 0 = 0.669
So any reasonable prediction rule should be correct more than 66.9% of the time.
Evaluating a Prediction Rule by Error Rates
Let y denote the outcome (0 or 1) and yhat the prediction (0 or 1). In general, we can predict “hit” (or yhat = 1) if the predicted probability of hit exceeds some constant k, and predict “out” (or yhat = 0) if the probability of hit is smaller than k. We can evaluate this rule by the error rates
P(yhat = 1 | y = 0) (predicting a hit when really it is an out)
P(yhat = 0 | y = 1) (predicting an out when really it is a hit)
One way to assess the goodness of a prediction model is to compute these error rates for all possible cutoff values k. A common way to do this is to graph P(yhat = 1 | y = 0) against the P(yhat = 1 | yhat = 1) (one minus the prediction error) for all cutoffs k — the resulting graph is called the ROC Curve.
Here is a graph of the ROC curve for the gam model using exit velocity and launch angle to predict hits. Each point on the curve represents the use of this model for a specific cutoff value k. The area under this ROC Curve can be used to measure the goodness of this prediction model.
To show that both exit velocity and launch angle are useful for predicting hits, we consider two simpler models — one gam model that only uses exit velocity to predict hits, and another gam model that only uses launch angle. Below I show the ROC curves for all three models. Also I label the points corresponding to two common prediction rules — one that predicts hit if the fitted probability exceeds .331 and another if the probability of hit exceeds .5. A couple of things are clear from the graph:
- If we were to use only one variable, launch angle is most important for predicting hits (larger area under the ROC curve).
- It is better to use both variables launch angle and exit velocity than one alone.
By the way, the overall rate of correct predictions using the .331 rule would be
P(out) P(correct | out) + P(hit) P(correct | hit) = 0.669 x 0.764 + 0.331 x 0.791 = 0.773
This correct prediction rate of 77.3% is about 10% higher than what we found using the silly prediction rule.
Correct Predictions at the Player Level
Suppose we predict hit if the fitted probability of hit exceeds .331. The bottom graph gives the proportion of correct predictions for each player in the 2017 season. Generally, for players with a reasonable number of batted balls, the proportion of correct predictions falls between 0.70 and 0.85 for most players. I’ve labeled a few interesting players — Reynolds and Stanton have correct prediction rates close to 0.85 (perhaps due to the large number of home runs), and Freese and Hernandez have correct prediction rates closer to 0.70. Hernandez is pretty fast and perhaps this reflects infield hits, but I am a little puzzled about the poor prediction rate for Freese.
- This post was motivated partly by my generalized linear models course that I’m teaching this semester. One learns best about statistical methods by trying them out on datasets of interest.
- Any model should be evaluated by its ability to make successful predictions. It is interesting that the rate of correct predictions using the Statcast variables is only 77.3%, although I suspect one could improve this rate by including other variables such as the spray angle. Since this correct prediction rate is low, this indicates that luck or other sources of variability are a bit part of hitting.
- Whenever one gets interested in individual differences such as the high and low prediction rates for Stanton, Reynolds, Hernandez, and Freese, the natural follow-up questions are: “Why? Do these hitters have special talents that cause these differences?” So as usual there is more to explore to address these questions.
- I have ignored the third key variable — spray angle — in this work. We’ll explore the influence of spray angle on the chance of a hit in next week’s post.
I know ROC curves are a standard statistical tool. But the arbitrary nature of the cutoff point has always troubled me. I’m interested in your thoughts on an alternative evaluation measure of your models. Since you already have predicted probabilities, phat, it seems like you could compare your models by considering the expected value of each estimator given the outcome: E(phat_1 | Y = 1), E(phat_2 | Y = 1), E(phat_3 | Y = 1), where greater values imply greater preference. I’d imagine the model preference order would be preserved with this measure. Assuming that the phats are approximately unbiased, is there any issue with using this measure to assess model quality? Also, is there any reason why a measure based on a ROC curve prediction rule would be preferable to that? I’m really interested in your thoughts on this as I’ve been struggling with this question in my own research.
Jesse, I haven’t really thought about those other ways of comparing predictions. One thing that came to mind was to incorporate the losses in the two types of errors, but I haven’t played with those recently. Your idea may have promise.