# Calculation of In-Game Win Probabilities

When you visit Baseball Reference’s boxes pages, such as the extended box score of the final game of the 2014 season, you will see a display of win probabilities. One particular graph shows the probability of the Giants winning the game after each play. These are used to measure player performances. For example, Madison Bumgarner was the most valuable player of this game since his total win probability added (WPA) over all of his plays was 0.603. (On a side note, Jay Bennett and I presented these type of calculations in Curve Ball.)

It is straightforward to compute these win probabilities. A first step in this calculation is to compute, at the end of each inning, the probability the home team wins the game. We illustrate one method of performing this computation using Retrosheet game logs for a particular season.

A function  plot.prob.home  is available on this github gist page. Let me outline the main steps of this function.

• The game logs for the particular season are downloaded from Retrosheet. A short file is also downloaded from my site that gives the header (variable names) for the game log file.
• The game log file contains the line scores for the home and visiting teams. The function will parse these line scores and create a numerical variable of runs scored for all innings. (The string function  strsplit  is helpful for doing this parsing.)
• It didn’t seem obvious how to parse line scores for games where with innings where 10 or more runs are scored. So I have omitted those line scores — I don’t think this would impact the results very much.
• A logistic regression model is used to develop a smooth curve for predicting the probability of a home team victor given the winning margin. This model has the form $\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 {\rm Home.Team.Lead}$
This model is applied for each of the bottoms of the eight innings. The regression intercept $\beta_0$ represents the home team advantage and $\beta_1$ represents the additional advantage for the home team (on the logit scale) for each additional run lead.
• I use the  ggplot2  package to plot the probability the home team wins as a function of the inning number (1 through 8) for each of the home team leads -4, -3, -2, -1, 0, 1, 2, 3, 4.
• Here are a couple of illustrations of the use of this function for two seasons — 1980 and 2013. Assuming you have the packages  devtools ,  arm , and  ggplot2  installed, you can just type the following R code into the Console window to see these graphs and have the probabilities displayed in a data frame.

library(devtools)
source_gist("70a166149f71622fed97")
plot.prob.home(1993) plot.prob.home(2013) 