One thing that just doesn’t make sense to me is this: every time a see an opposite-field HR on an inside-out swing, I go check Exit Velocity and most of the time they are around 100mph.

Under your estimation, those HR are not particularly aided by the ball, as the EV is pretty high. However, they don’t seem to match with the visuals of the swings and contact.

Do you think there’s a possibility that the new balls are also resulting in higher Exit Velocities? If that’s the case, your model would miss that contribution by the balls, giving all the credit for the higher velocities to the batter.

Again, really enjoyed the post. Good stuff.

]]>Nice job!

]]>That default brms model provides much more flexibility than just assuming a negative binomial to handle overdispersion. You can easily add random effects like ballparks as you suggest and also fixed effects like temperature and humidity.

P.S. I didn’t initially see you on the program for the sabermetrics conference—I had to click on the Sunday button to get the link for Sunday talks.

]]>o19 %

arrange(desc(N))

sc_2019ip %>% anti_join(o19)

]]>M <- 1e4 y_sim <- rpois(M, rnorm(M, 3311, sqrt(3311))) quantile(y_sim, c(0.025, 0.975))

Stan declaration needs

real<lower = 0> lambda;]]>

If I really cared about this prediction, I’d want to model the uncertainty due to the first half of the season only being a sample (despite it being a large-ish sample). A simple-minded approach might be to say OK, the real rate is itself probably something like normal(3311, sqrt(3311)) so we might model estimation uncertainty as Z ~ normal(3311, sqrt(3311)), and set Y = 3311 + Poisson(Z). If I simulate that I get central 95% posterior intervals of (6466, 6781).

> M y_sim quantile(y_sim, c(0.025, 0.975)) 2.5% 97.5% 6466 6781

Then if I really cared, I’d fit a proper Bayesian model and do full Bayesian posterior predictive inference. I can write a simple Stan model that hard-codes the data

transformed data { int first_half_hr = 3311; } parameters { real lambda; } model { first_half_hr ~ poisson(lambda); } generated quantities { real second_half_hr_sim = poisson_rng(lambda); real total_hr_sim = first_half_hr + second_half_hr_sim; }

and fit it in R:

> library(rstan) > fit print(fit, pars=c("total_hr_sim"), probs=c(0.025, 0.975), digits = 0) mean se_mean sd 2.5% 97.5% n_eff Rhat total_hr_sim 6625 2 83 6464 6790 1974 1

Nobody expects those last digits to accurately describe the 95% intervals, so we might as well report as the bootstrap from the article (6520, 6770), fixed rate Poisson (6510, 6740), normal-Poisson (6470, 6780), and finally, the Bayesian Poisson predictive model, (6460, 6790).

]]>https://stackoverflow.com/questions/55938608/cant-install-baseballr-package

Although for me the error is “cannot remove prior installation of package ‘rlang’”

]]>Apparently, the Maris asterisk is an urban legend.

https://www.salon.com/2001/10/03/asterisk/ ]]>