Author Archive: bmmills

Over/Under Outcomes Home and Away

In my academic work, I’ve most recently been dealing with some betting line data under different scenarios. In particular, I have a fun little data set on Over/Unders from Covers.com that allows me to take a quick look at how teams go over and under the total score expectation in the betting market when they are at home and when they are away. This will be a pretty simple exercise, but something that can go a lot of directions from here. I’ll note I don’t bet on sports–or really anything for that matter–but it’s a fascinating market to look at.

Let’s begin by grabbing some data from my website here. This data is from 2012 to 2014, and includes the true Total Score, Over/Under line, Game Date, and Home & Away teams. Then we’ll load in the data and have a look.

###get some data
setwd("c:/...")
ou <- read.csv(file="OverUnderMLB.csv", h=T)
head(ou)
nrow(ou)

###take a look by year
tapply(ou$OULine, ou$year, mean)
 2012  2013  2014
8.188 8.017 7.779

tapply(ou$TotalScore, ou$year, mean)
 2012  2013  2014
8.649 8.332 8.132 

###take a look by month
tapply(ou$OULine, ou$month, mean)
    3     4     5     6     7     8     9    10 
7.289 7.859 7.948 8.173 8.173 7.985 7.845 7.911 

tapply(ou$TotalScore, ou$month, mean)
    3     4     5     6     7     8     9    10 
8.158 8.433 8.507 8.433 8.355 8.359 8.193 6.933

A newbie might look at this data and think, “Aha! Just bet the over. The data are biased.” But we need to be careful. It very well could be that games that hit the over do so by a large margin, while those hitting the under do so by a small margin. In fact, with the score possibilities bounded at zero, this is exactly what we should expect. In that case, the average would be skewed upward, while the probability of being larger than the over is unchanged. And, after all, it’s the probability of winning that we care about.

So, instead, let’s make a variable that shows whether betting the Over was a winner, betting the Under was a winner, or when there was a Push.


###add indicator of over, under, and push
ou$over <- ifelse(ou$TotalScore &gt; ou$OULine, 1, 0)
ou$under <- ifelse(ou$TotalScore &lt; ou$OULine, 1, 0)
ou$push <- ifelse(ou$TotalScore == ou$OULine, 1, 0)

mean(ou$over)
[1] 0.4622137

mean(ou$under)
[1] 0.4912906

mean(ou$push)
[1] 0.04649568

###now take a look by year and month
tapply(ou$over, ou$year, mean)
 2012  2013  2014 
0.463 0.460 0.463 

tapply(ou$under, ou$year, mean)
 2012  2013  2014 
0.491 0.494 0.488 

tapply(ou$over, ou$month, mean)
    3     4     5     6     7     8     9    10 
0.474 0.490 0.490 0.452 0.435 0.448 0.466 0.267

tapply(ou$under, ou$month, mean)
    3     4     5     6     7     8     9    10 
0.474 0.468 0.460 0.511 0.523 0.498 0.481 0.667 

Ok. Now we see that perhaps just betting that Over wasn’t a good idea after all. But we could also take a look and see what teams hit the Over or Under more often, or how they fare Home versus Away. The code below will give us a small glimpse. First, I’ll paste together the columns to have a teamYear variable, then aggregate some variables of interest across this using tapply.

###make key variables for aggregation with tapply
ou$homeYear <- paste(ou$Hteam,"_",ou$year,sep="")
head(ou)

ou$awayYear <- paste(ou$Ateam,"_",ou$year,sep="")
head(ou)

###aggregate by team-year
OU_Home <- data.frame(round(tapply(ou$OULine, ou$homeYear, mean), 3))
OU_Home$team <- row.names(OU_Home)
colnames(OU_Home) <- c("homeOU", "team")

OU_Away <- data.frame(round(tapply(ou$OULine, ou$awayYear, mean), 3))
OU_Away$team <- row.names(OU_Away)
colnames(OU_Away) <- c("awayOU", "team")

Total_Home <- data.frame(round(tapply(ou$TotalScore, ou$homeYear, mean), 3))
Total_Home$team <- row.names(Total_Home)
colnames(Total_Home) <- c("homeTotal", "team")

Total_Away <- data.frame(round(tapply(ou$TotalScore, ou$awayYear, mean), 3))
Total_Away$team <- row.names(Total_Away)
colnames(Total_Away) <- c("awayTotal", "team")

OU_Hover <- data.frame(round(tapply(ou$over, ou$homeYear, mean), 3))
OU_Hover$team <- row.names(OU_Hover)
colnames(OU_Hover) <- c("homeOver", "team")

OU_Aover <- data.frame(round(tapply(ou$over, ou$awayYear, mean), 3))
OU_Aover$team <- row.names(OU_Aover)
colnames(OU_Aover) <- c("awayOver", "team")

OU_Hunder <- data.frame(round(tapply(ou$under, ou$homeYear, mean), 3))
OU_Hunder$team <- row.names(OU_Hunder)
colnames(OU_Hunder) <- c("homeUnder", "team")

OU_Aunder <- data.frame(round(tapply(ou$under, ou$homeYear, mean), 3))
OU_Aunder$team <- row.names(OU_Aunder)
colnames(OU_Aunder) <- c("awayUnder", "team")


###merge together
homeAway <- merge(Total_Home, Total_Away, by="team", all=T)
homeAway <- merge(homeAway, OU_Home, by="team", all=T)
homeAway <- merge(homeAway, OU_Away, by="team", all=T)
homeAway <- merge(homeAway, OU_Hover, by="team", all=T)
homeAway <- merge(homeAway, OU_Aover, by="team", all=T)
homeAway <- merge(homeAway, OU_Hunder, by="team", all=T)
homeAway <- merge(homeAway, OU_Aunder, by="team", all=T)

head(homeAway)
      team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder
1 ARI_2012     9.469     8.086  8.957  7.889    0.481    0.469     0.469     0.469
2 ARI_2013     8.407     8.630  8.506  7.747    0.420    0.457     0.556     0.556
3 ARI_2014     8.975     7.778  8.198  7.735    0.469    0.444     0.481     0.481
4 ATL_2012     8.173     7.877  7.741  7.914    0.407    0.420     0.556     0.556
5 ATL_2013     7.457     7.802  7.401  7.790    0.407    0.494     0.531     0.531
6 ATL_2014     7.000     7.444  7.111  7.438    0.395    0.383     0.506     0.506
dim(homeAway)
[1] 90 10

I probably could have used dplyr some other package to do that more efficiently, but it works. First, let’s order the data by the largest Home Over/Under. I’m sure you can guess which team this will be (Hint: they’re a mile high).

homeAway <- homeAway[order(homeAway$homeOU, decreasing = TRUE), ]
head(homeAway)
       team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder
25 COL_2012    12.457     7.889 10.198  8.204    0.605    0.432     0.346     0.346
27 COL_2014    11.654     7.765 10.117  7.568    0.531    0.444     0.407     0.407
26 COL_2013    10.136     7.963  9.802  7.901    0.506    0.432     0.469     0.469
82 TEX_2012    10.136     8.568  9.728  8.315    0.469    0.420     0.494     0.494
10 BOS_2012    10.395     8.617  9.475  8.407    0.519    0.444     0.420     0.420
55 NYY_2012     9.049     9.123  9.272  8.599    0.420    0.457     0.568     0.568

No surprises the Rockies top the list for all 3 years. However, there’s something a bit more intriguing: they’re rather consistently hitting the over, even despite the large O/U expectations. Let’s reorder our data and see if this happens for other teams.

###order data by Home Over Win Rate
homeAway <- homeAway[order(homeAway$homeOver, decreasing = TRUE), ]
row.names(homeAway) <- 1:nrow(homeAway)

homeAway[1:25,]
       team homeTotal awayTotal homeOU awayOU homeOver awayOver homeUnder awayUnder
1  COL_2012    12.457     7.889 10.198  8.204    0.605    0.432     0.346     0.346
2  LAD_2014     7.840     8.642  6.901  7.747    0.593    0.432     0.346     0.346
3  MIL_2012    10.037     8.593  8.265  8.025    0.580    0.519     0.407     0.407
4  MIN_2014     9.716     8.704  8.222  8.099    0.580    0.494     0.370     0.370
5  DET_2013     9.333     8.198  8.309  8.105    0.556    0.444     0.383     0.383
6  CHW_2012     9.827     7.753  8.654  8.377    0.543    0.346     0.420     0.420
7  LAA_2013     8.926     9.222  8.105  8.389    0.543    0.543     0.444     0.444
8  COL_2014    11.654     7.765 10.117  7.568    0.531    0.444     0.407     0.407
9  CHW_2014     8.975     8.531  8.210  8.154    0.531    0.457     0.457     0.457
10 DET_2014     9.037     9.012  8.210  7.975    0.531    0.506     0.420     0.420
11 PHI_2013     8.815     7.963  7.858  7.716    0.531    0.494     0.420     0.420
12 SEA_2013     8.469     8.543  7.420  8.056    0.531    0.481     0.432     0.432
13 BOS_2012    10.395     8.617  9.475  8.407    0.519    0.444     0.420     0.420
14 MIL_2013     8.617     7.765  8.340  7.889    0.519    0.420     0.469     0.469
15 OAK_2014     8.123     7.938  7.407  7.988    0.519    0.432     0.469     0.469
16 SDP_2012     7.741     9.062  6.778  8.296    0.519    0.531     0.469     0.469
17 COL_2013    10.136     7.963  9.802  7.901    0.506    0.432     0.469     0.469
18 STL_2012     8.654     8.790  8.160  8.198    0.506    0.457     0.457     0.457
19 MIA_2012     8.247     8.210  7.722  8.006    0.506    0.395     0.420     0.420
20 PHI_2012     8.284     8.556  7.691  7.667    0.506    0.519     0.432     0.432
21 WSN_2014     7.914     7.407  7.148  7.333    0.506    0.444     0.407     0.407
22 TOR_2012     9.296     9.222  8.852  8.574    0.494    0.457     0.457     0.457
23 TOR_2013     9.568     8.556  8.809  8.475    0.494    0.481     0.481     0.481
24 BAL_2012     9.444     8.049  8.772  8.327    0.494    0.383     0.469     0.469
25 HOU_2013     9.321     8.679  8.475  8.364    0.494    0.494     0.457     0.457

So, 21 times we see teams finishing the season with an Over win rate of 50% or higher. However, the Rockies are the only team to show up in all 3 years above 50% (in fact, they’ve done it every year from 2010 to 2014, but not in 2008 or 2009), and the only one to have a 60% or more Over rate from 2012-2014. Interesting.

It’s at this point I must admit that this article discussing a betting strategy for late season Coors Field games is what piqued my interest to begin with. As a last look, let’s see if there are further things going on in July, August, and September with respect to these rates, as proposed by the linked article. I’m always skeptical of supposed inefficiencies in these markets, but so far, there seems to be some rather straight forward things to take advantage of, with the 5 year Colorado over streak. If this is especially true in the later season, we might be able to account for the juice going to Vegas.

###Colorado by Month
rockies <- subset(ou, ou$Hteam=="COL")
nrow(rockies)

round(tapply(rockies$over, rockies$month, mean), 3)
    4     5     6     7     8     9 
0.514 0.571 0.511 0.375 0.590 0.737 

round(tapply(rockies$over, rockies$month, length), 3)
 4  5  6  7  8  9 
37 42 47 40 39 38 

So, strangely enough, the Over win rate in July is extremely low, but above 50% in all other months over this period. It’s not clear what’s happening in July, but it seems that August and September are prime months for the Over. So, only partial evidence for the claim. Overall, I find this pretty interesting. I’m curious what others think about the park factors here with respect to the betting lines. Are they just bad? Or is there something being missed in all this?

I didn’t do much rigorous analysis here, just an exploratory look. In any case, it was fun to take a look at claims I’m skeptical of, and find some possible supporting evidence in the data. Keep an eye out for some academic work I’ll be (hopefully) presenting on some betting market efficiency this summer.

 

NOTE: There is an academic paper on atmospheric conditions in betting markets in general at the International Journal of Sport Finance that goes into detail on effects here. The citation is as follows: Paul, RJ Weinbach, AP, & Weinbach, C. (2014). The Impact of Atmospheric Conditions on the Baseball Totals Market. International Journal of Sport Finance, 9, 249-260. It’s a really neat paper by a group of authors that have done a lot of interesting work in the area.

Advertisement