Daily Predictions at Final AVG

An interested reader asks:

“If you have a point estimate on posey, turner, zimmerman’s eoy batting average, you have an algorithm that predicts each player’s eoy batting average on any day. is it proprietary, or are you willing to provide it? …  I’m certainly NOT the only person interested in the answers to those questions!”

Okay, here is some code that will provide the final season batting averages using my component method for any day of interest.  I recently described my component method.  Basically, one breaks down the BA into three rates:  the K rate (SO / AB), the HR rate (HR / (AB – SO)) and the BA in-play rate ((H – HR) / (AB – SO – HR)), one simultaneously estimates each group of component rates using a basic multilevel model, and then one combines the component predictions for each player to get a prediction at the final season AVG.  In my paper I provide this method generally is superior to the basic method of shrinking batting averages towards the mean.

Here are the details for getting daily predictions (all of the R code is provided on my gist site):

  1.  I needed to find a website that provided the standard batting data for the current day in the MLB season and could be easily imported into R.  Looking around, I was successful in reading the batting data (for qualifying hitters) from the Sports Illustrated page using the htmltab package.  So I wrote a short function collect_data that does the scraping of the current day’s data.
  2. Once I have collected the data, I just apply the fit_comp_half function (also on my gisthub site) to implement this method.

I am writing this Sunday morning June 4.  The following graph shows the current AVG and the final season predictions.  I have identified some interesting points from the graph.  Miguel Sano is an example of player who is predicted to drop in AVG.  Murphy, Zimmerman, and Turner are all predicted to finish above .300, but in a different order than their current AVGs.


Here are the computations for Miguel Sano who currently has a BA of .299 but I predict will have a final season AVG under .250.  The fit_comp_half function outputs a data frame of computations for all players.

filter(d2$S, H / AB1 > .29, Comp.Est < .25)
     playerID SO  AB   SO.Rate HR AB.SO    HR.Rate H.HR AB.SO.HR
1 12 Sano, M. 77 174 0.3973213 13    97 0.09818932   39       84
    H.Rate  Comp.Est  H AB1 Shrinkage.Est
1 0.338869 0.2433526 52 174     0.2753458

Sano’s component rates are SO Rate = 77 / 174 = 0.442, HR Rate = 13 / 97 = 0.134, and BABIP rate = 39 / 84 = 0.464. All of these rates are relatively large. The final season predictions at these rates are respectively 0.397, 0.098, and 0.339. All of these predictions move the observed rates towards the corresponding averages, but the movement towards the average is most severe for the BABIP rate. The prediction at Sano’s final AVG is

Predicted AVG = (1 – 0.397) * (0.098 + ((1 – 0.098) * 0.339) = 0.243

which is significantly smaller than his current AVG of 52 / 174 = 0.299.

Anyway, I encourage the reader to try out this code for any day this season.  The results are most interesting when the current season averages have a lot of variability.

Added Later in the Day

To make these functions easier to use, I put together an R package BApredict containing three functions — collect_hitting_data() collects the data from the SI site, component_predict() computes the predictions using my method, and graph_predictions() constructs a scatterplot of the current and predicted BA’s. The code below installs and loads the package and runs these functions using today’s data.

d <- collect_hitting_data()
out <- component_predict(d)

R packages are easy to construct nowadays — maybe that will be the subject of next week’s post.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: