Over/Under Outcomes Home and Away
In my academic work, I’ve most recently been dealing with some betting line data under different scenarios. In particular, I have a fun little data set on Over/Unders from Covers.com that allows me to take a quick look at how teams go over and under the total score expectation in the betting market when they are at home and when they are away. This will be a pretty simple exercise, but something that can go a lot of directions from here. I’ll note I don’t bet on sports–or really anything for that matter–but it’s a fascinating market to look at.
Let’s begin by grabbing some data from my website here. This data is from 2012 to 2014, and includes the true Total Score, Over/Under line, Game Date, and Home & Away teams. Then we’ll load in the data and have a look.
###get some data setwd("c:/...") ou <- read.csv(file="OverUnderMLB.csv", h=T) head(ou) nrow(ou) ###take a look by year tapply(ou$OULine, ou$year, mean) 2012 2013 2014 8.188 8.017 7.779 tapply(ou$TotalScore, ou$year, mean) 2012 2013 2014 8.649 8.332 8.132 ###take a look by month tapply(ou$OULine, ou$month, mean) 3 4 5 6 7 8 9 10 7.289 7.859 7.948 8.173 8.173 7.985 7.845 7.911 tapply(ou$TotalScore, ou$month, mean) 3 4 5 6 7 8 9 10 8.158 8.433 8.507 8.433 8.355 8.359 8.193 6.933
A newbie might look at this data and think, “Aha! Just bet the over. The data are biased.” But we need to be careful. It very well could be that games that hit the over do so by a large margin, while those hitting the under do so by a small margin. In fact, with the score possibilities bounded at zero, this is exactly what we should expect. In that case, the average would be skewed upward, while the probability of being larger than the over is unchanged. And, after all, it’s the probability of winning that we care about.
So, instead, let’s make a variable that shows whether betting the Over was a winner, betting the Under was a winner, or when there was a Push.
###add indicator of over, under, and push ou$over <- ifelse(ou$TotalScore > ou$OULine, 1, 0) ou$under <- ifelse(ou$TotalScore < ou$OULine, 1, 0) ou$push <- ifelse(ou$TotalScore == ou$OULine, 1, 0) mean(ou$over) [1] 0.4622137 mean(ou$under) [1] 0.4912906 mean(ou$push) [1] 0.04649568 ###now take a look by year and month tapply(ou$over, ou$year, mean) 2012 2013 2014 0.463 0.460 0.463 tapply(ou$under, ou$year, mean) 2012 2013 2014 0.491 0.494 0.488 tapply(ou$over, ou$month, mean) 3 4 5 6 7 8 9 10 0.474 0.490 0.490 0.452 0.435 0.448 0.466 0.267 tapply(ou$under, ou$month, mean) 3 4 5 6 7 8 9 10 0.474 0.468 0.460 0.511 0.523 0.498 0.481 0.667
Ok. Now we see that perhaps just betting that Over wasn’t a good idea after all. But we could also take a look and see what teams hit the Over or Under more often, or how they fare Home versus Away. The code below will give us a small glimpse. First, I’ll paste together the columns to have a teamYear variable, then aggregate some variables of interest across this using tapply
.
###make key variables for aggregation with tapply ou$homeYear <- paste(ou$Hteam,"_",ou$year,sep="") head(ou) ou$awayYear <- paste(ou$Ateam,"_",ou$year,sep="") head(ou) ###aggregate by team-year OU_Home <- data.frame(round(tapply(ou$OULine, ou$homeYear, mean), 3)) OU_Home$team <- row.names(OU_Home) colnames(OU_Home) <- c("homeOU", "team") OU_Away <- data.frame(round(tapply(ou$OULine, ou$awayYear, mean), 3)) OU_Away$team <- row.names(OU_Away) colnames(OU_Away) <- c("awayOU", "team") Total_Home <- data.frame(round(tapply(ou$TotalScore, ou$homeYear, mean), 3)) Total_Home$team <- row.names(Total_Home) colnames(Total_Home) <- c("homeTotal", "team") Total_Away <- data.frame(round(tapply(ou$TotalScore, ou$awayYear, mean), 3)) Total_Away$team <- row.names(Total_Away) colnames(Total_Away) <- c("awayTotal", "team") OU_Hover <- data.frame(round(tapply(ou$over, ou$homeYear, mean), 3)) OU_Hover$team <- row.names(OU_Hover) colnames(OU_Hover) <- c("homeOver", "team") OU_Aover <- data.frame(round(tapply(ou$over, ou$awayYear, mean), 3)) OU_Aover$team <- row.names(OU_Aover) colnames(OU_Aover) <- c("awayOver", "team") OU_Hunder <- data.frame(round(tapply(ou$under, ou$homeYear, mean), 3)) OU_Hunder$team <- row.names(OU_Hunder) colnames(OU_Hunder) <- c("homeUnder", "team") OU_Aunder <- data.frame(round(tapply(ou$under, ou$homeYear, mean), 3)) OU_Aunder$team <- row.names(OU_Aunder) colnames(OU_Aunder) <- c("awayUnder", "team") ###merge together homeAway <- merge(Total_Home, Total_Away, by="team", all=T) homeAway <- merge(homeAway, OU_Home, by="team", all=T) homeAway <- merge(homeAway, OU_Away, by="team", all=T) homeAway <- merge(homeAway, OU_Hover, by="team", all=T) homeAway <- merge(homeAway, OU_Aover, by="team", all=T) homeAway <- merge(homeAway, OU_Hunder, by="team", all=T) homeAway <- merge(homeAway, OU_Aunder, by="team", all=T) head(homeAway) team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder 1 ARI_2012 9.469 8.086 8.957 7.889 0.481 0.469 0.469 0.469 2 ARI_2013 8.407 8.630 8.506 7.747 0.420 0.457 0.556 0.556 3 ARI_2014 8.975 7.778 8.198 7.735 0.469 0.444 0.481 0.481 4 ATL_2012 8.173 7.877 7.741 7.914 0.407 0.420 0.556 0.556 5 ATL_2013 7.457 7.802 7.401 7.790 0.407 0.494 0.531 0.531 6 ATL_2014 7.000 7.444 7.111 7.438 0.395 0.383 0.506 0.506 dim(homeAway) [1] 90 10
I probably could have used dplyr
some other package to do that more efficiently, but it works. First, let’s order the data by the largest Home Over/Under. I’m sure you can guess which team this will be (Hint: they’re a mile high).
homeAway <- homeAway[order(homeAway$homeOU, decreasing = TRUE), ] head(homeAway) team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder 25 COL_2012 12.457 7.889 10.198 8.204 0.605 0.432 0.346 0.346 27 COL_2014 11.654 7.765 10.117 7.568 0.531 0.444 0.407 0.407 26 COL_2013 10.136 7.963 9.802 7.901 0.506 0.432 0.469 0.469 82 TEX_2012 10.136 8.568 9.728 8.315 0.469 0.420 0.494 0.494 10 BOS_2012 10.395 8.617 9.475 8.407 0.519 0.444 0.420 0.420 55 NYY_2012 9.049 9.123 9.272 8.599 0.420 0.457 0.568 0.568
No surprises the Rockies top the list for all 3 years. However, there’s something a bit more intriguing: they’re rather consistently hitting the over, even despite the large O/U expectations. Let’s reorder our data and see if this happens for other teams.
###order data by Home Over Win Rate homeAway <- homeAway[order(homeAway$homeOver, decreasing = TRUE), ] row.names(homeAway) <- 1:nrow(homeAway) homeAway[1:25,] team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder 1 COL_2012 12.457 7.889 10.198 8.204 0.605 0.432 0.346 0.346 2 LAD_2014 7.840 8.642 6.901 7.747 0.593 0.432 0.346 0.346 3 MIL_2012 10.037 8.593 8.265 8.025 0.580 0.519 0.407 0.407 4 MIN_2014 9.716 8.704 8.222 8.099 0.580 0.494 0.370 0.370 5 DET_2013 9.333 8.198 8.309 8.105 0.556 0.444 0.383 0.383 6 CHW_2012 9.827 7.753 8.654 8.377 0.543 0.346 0.420 0.420 7 LAA_2013 8.926 9.222 8.105 8.389 0.543 0.543 0.444 0.444 8 COL_2014 11.654 7.765 10.117 7.568 0.531 0.444 0.407 0.407 9 CHW_2014 8.975 8.531 8.210 8.154 0.531 0.457 0.457 0.457 10 DET_2014 9.037 9.012 8.210 7.975 0.531 0.506 0.420 0.420 11 PHI_2013 8.815 7.963 7.858 7.716 0.531 0.494 0.420 0.420 12 SEA_2013 8.469 8.543 7.420 8.056 0.531 0.481 0.432 0.432 13 BOS_2012 10.395 8.617 9.475 8.407 0.519 0.444 0.420 0.420 14 MIL_2013 8.617 7.765 8.340 7.889 0.519 0.420 0.469 0.469 15 OAK_2014 8.123 7.938 7.407 7.988 0.519 0.432 0.469 0.469 16 SDP_2012 7.741 9.062 6.778 8.296 0.519 0.531 0.469 0.469 17 COL_2013 10.136 7.963 9.802 7.901 0.506 0.432 0.469 0.469 18 STL_2012 8.654 8.790 8.160 8.198 0.506 0.457 0.457 0.457 19 MIA_2012 8.247 8.210 7.722 8.006 0.506 0.395 0.420 0.420 20 PHI_2012 8.284 8.556 7.691 7.667 0.506 0.519 0.432 0.432 21 WSN_2014 7.914 7.407 7.148 7.333 0.506 0.444 0.407 0.407 22 TOR_2012 9.296 9.222 8.852 8.574 0.494 0.457 0.457 0.457 23 TOR_2013 9.568 8.556 8.809 8.475 0.494 0.481 0.481 0.481 24 BAL_2012 9.444 8.049 8.772 8.327 0.494 0.383 0.469 0.469 25 HOU_2013 9.321 8.679 8.475 8.364 0.494 0.494 0.457 0.457
So, 21 times we see teams finishing the season with an Over win rate of 50% or higher. However, the Rockies are the only team to show up in all 3 years above 50% (in fact, they’ve done it every year from 2010 to 2014, but not in 2008 or 2009), and the only one to have a 60% or more Over rate from 2012-2014. Interesting.
It’s at this point I must admit that this article discussing a betting strategy for late season Coors Field games is what piqued my interest to begin with. As a last look, let’s see if there are further things going on in July, August, and September with respect to these rates, as proposed by the linked article. I’m always skeptical of supposed inefficiencies in these markets, but so far, there seems to be some rather straight forward things to take advantage of, with the 5 year Colorado over streak. If this is especially true in the later season, we might be able to account for the juice going to Vegas.
###Colorado by Month rockies <- subset(ou, ou$Hteam=="COL") nrow(rockies) round(tapply(rockies$over, rockies$month, mean), 3) 4 5 6 7 8 9 0.514 0.571 0.511 0.375 0.590 0.737 round(tapply(rockies$over, rockies$month, length), 3) 4 5 6 7 8 9 37 42 47 40 39 38
So, strangely enough, the Over win rate in July is extremely low, but above 50% in all other months over this period. It’s not clear what’s happening in July, but it seems that August and September are prime months for the Over. So, only partial evidence for the claim. Overall, I find this pretty interesting. I’m curious what others think about the park factors here with respect to the betting lines. Are they just bad? Or is there something being missed in all this?
I didn’t do much rigorous analysis here, just an exploratory look. In any case, it was fun to take a look at claims I’m skeptical of, and find some possible supporting evidence in the data. Keep an eye out for some academic work I’ll be (hopefully) presenting on some betting market efficiency this summer.
NOTE: There is an academic paper on atmospheric conditions in betting markets in general at the International Journal of Sport Finance that goes into detail on effects here. The citation is as follows: Paul, RJ Weinbach, AP, & Weinbach, C. (2014). The Impact of Atmospheric Conditions on the Baseball Totals Market. International Journal of Sport Finance, 9, 249-260. It’s a really neat paper by a group of authors that have done a lot of interesting work in the area.
Recent Comments