2018 Home run hitting
In last week’s post, I explored the home run hitting in the first part of the 2018 season. The rate of hitting home runs on balls in play has dropped this season. I made several comments about this drop. First, average launch angles and average launch speeds have increased this season, so one might expect more, not less home runs this season. Second, I consider the use of a model to predict the probability of a home run given launch angle and launch speed using 2017 data. If you use this “2017 model” to predict the number of home runs using the 2018 launch angles and exit velocities, one finds the observed number of 2018 home runs hit is smaller than the actual number of home runs by a large amount.
One criticism of this approach is that I ignored ballpark temperature that is known to have a significant impact on home runs hit. One reason I did not consider temperature is that I did not have the temperature data readily available. Okay, that motivated me to try some things:
- Can I scrape game day temperatures and merge this data with my Statcast data?
- Can I fit a new model for hitting home runs that includes temperature?
- Can I adjust the expected number of home runs hit for the 2018 by temperature? Does temperature explain the drop in 2018 home runs?
Ballpark temperature and home run rates
Maybe there is a simple source of temperature data, but I wasn’t able to find one quickly. I did noticed that Baseball Reference provides the temperature at the start of the game on their box score page for an individual game. I needed some practice for my data science course in the fall. So I scraped the temperature info for each of the 2430 games in the 2017 regular season. (Essentially you read in the raw text file from the Baseball Reference page, look for the line that includes “Time Weather”, and extract the two digit temperature value on that line of the file.)
For each possible game day temperature, I found (1) the number of home runs hit, and (2) the number of balls put in play. Below I display a scatterplot of the temperature and the in-play home run rate. The size of the plotting point is proportional to the number of games with that temperature. (As you see most games are played at temperatures between 60 and 80 degrees.) It is clear that the home run rate is approximately a linear function of the temperature. At 60 degrees, the predicted home run rate is about 5% and it increases to 6 % at 80 degrees.
Modeling home runs using temperature
When I merge the 2017 Statcast hitting data with the temperatures, I can now construct a new model for home run hitting that includes launch angle, launch speed and also temperature. My generalized additive model can be written as
logit(Prob(Home Run)) = s(launch_angle, launch_speed) + temp
In the formula, s() is a smooth function of launch angle and launch speed and temp is included as a linear effect. To give you a flavor of the predictions from the model, below I fix the launch to be 30 degrees and show the estimated probability of a home run as a function of launch speed. There are three curves — one for games at 40 degrees, one for games at 60 degrees and one line for games at 80 degrees. To help interpret this graph, I show a line corresponding to a batted ball hit at 100 mpg. At this speed (and launch angle of 30 degrees), the probability of a home run is .375 at 40 degrees, .500 at 60 degrees, and .625 at 80 degrees. This is a sizable change — clearly temperature is a relevant variable for home run rates.
Expected 2018 home runs using model
Can temperature explain the decrease in home run hitting in 2018? Here’s what I tried:
- Using launch speed, launch angle, and temperature for all balls in play for the 2017 season, I found a reasonable model for predicting the probability of a home run (as described above).
- I collected the temperatures for all 2018 games. Using the 2017 model, I found the probability of a home run for each 2018 batted ball using the launch speed, launch speed, and temperature. By summing these probabilities over all batted balls, I found the expected number of home runs.
- In last week’s post, I found that the 2018 home run count was about 500 lower than one might expect using the 2017 model using launch angle and launch speed. The criticism of this comment is that I did not adjust for the cold temperatures during the 2018 season. By using a new model using three variables (including temperature), I find that the 2018 home run still is about 300 behind what one would expect based on the 2017 history. So temperature explains some, but not all of the drop in the 2018 home runs.
Week by Week Home Run Rates
By the way, I thought it would be interesting to plot the HR rate for the 2017 and 2018 seasons by week number — if there was really a cold-weather effect, one would think that the 2018 HR rate to be similar to that for 2017 for recent weeks. Here’s the graph (using data through the games of June 3, 2018). It is interesting that the 2017 HR rate is much higher than the 2018 rate for the last three weeks (week numbers 20 through 22). I don’t think this difference is due to a temperature effect.
- This was good practice on merging data from different sources, where one source comes from scraping temperatures from the Internet. There are some technical issues — for example, the team abbreviations on Statcast are inconsistent with the team abbreviations used by Retrosheet and Baseball Reference.
- We have more exploration to do with home run hitting since game temperature does not appear to totally explain the decrease in 2018 home run hitting. Our game temperature is not an exact measurement — ideally one would want the temperature at each ball put in play. Other weather factors such as window likely play a role, so really several weather measurements should be available for each plate appearance.