openWAR in 2014
Over at Stats In the Wild, my collaborator Greg Matthews has been monitoring the results of openWAR
for the current season. How is he doing this?
The first step, of course, is to load the openWAR
package.
require(openWAR)
Getting the 2014 Data
Now we need some data. While data from the previous two years is bundled into the openWAR
package, data from the current season is not – so we’ll have to download it. We have made this as painless as possible. All we have to do is tell the getData()
function the time interval over which we want to download game data, and it will do the rest. In this case, we want all games from the season opener played by the Dodgers and Diamondbacks in Australia on March 22nd, through today’s games.
Warning: this will take a while to run – possibly an hour or so.
Since we invested so much time in downloading this data, let’s save it to disk so that we don’t have to download it again. Note that the resulting object is just a data.frame
, so if we download more data tomorrow, we can just rbind()
the new data to the old.
MLBAM2014 = getData(start = "2014-03-22", end = "2014-08-22") save(MLBAM2014, file = "MLBAM2014.rda")
Note the venue for the first game!
MLBAM2014[1, "stadium"]
## [1] Sydney Cricket Ground ## 58 Levels: Maryvale Baseball Park ... Safeco Field
This data.frame
contains nearly 150,000 rows, and takes up a decent bit of memory, so you may experience some sluggishness depending on your machine.
dim(MLBAM2014)
## [1] 144739 62
print(object.size(MLBAM2014), units = "Mb")
## 99.1 Mb
Computing openWAR
The computation of openWAR
involves fitting 26 different models to the data, so we’ll have to do this for our 2014 data. We have rolled these all into one function called makeWAR
.
Again, this may take a fair amount of time and memory depending on your machine. We’re hoping to optimize this process in the future.
Before we do anything else, we should save these results to disk so we won’t lose them.
ds = makeWAR(MLBAM2014) openWARPlays.2014 = ds$openWAR save(openWARPlays.2014, file = "openWARPlays.2014.rda")
Now we have a object of class openWARPlays
. It’s a data.frame
that contains a whole bunch of other information that results from our openWAR
calculations. Each row in this data.frame
corresponds exactly to one row of play-by-play data in MLBAM2014
.
dim(openWARPlays.2014)
## [1] 144739 37
We can look at individual plays if we like. For example, in which play from 2014 did Mike Trout earn his most RAA as a CF? First, we need to grab Mike Trout’s MLBAM ID. You can do this by looking at the URL in his player page on mlb.com, or you can just use grep()
.
head(MLBAM2014[grepl("Trout", MLBAM2014$batterName), c("batterName", "batterId")])
## batterName batterId ## 2601 Trout 545361 ## 19119 Trout 545361 ## 35119 Trout 545361 ## 53122 Trout 545361 ## 2701 Trout 545361 ## 17132 Trout 545361
Now let’s rank the plays by RAA accrued to the CF when Trout was playing CF.
trout.cf = subset(cbind(description = MLBAM2014$description, openWARPlays.2014), playerId.CF == 545361) trout.cf.idx = order(trout.cf$raa.CF, decreasing=TRUE) head(trout.cf[trout.cf.idx, c("description", "raa.CF")])
## description ## 20164 Nick Castellanos flies out to center fielder Mike Trout. Torii Hunter to 3rd. ## 98908 Jose Altuve lines out to center fielder Mike Trout. ## 95793 Paul Konerko flies out to center fielder Mike Trout. Jose Abreu to 3rd. ## 19100 Miguel Cabrera flies out to center fielder Mike Trout. ## 141319 Dustin Pedroia flies out to center fielder Mike Trout. ## 82261 Nick Swisher flies out to center fielder Mike Trout. ## raa.CF ## 20164 0.3535 ## 98908 0.3110 ## 95793 0.3106 ## 19100 0.3089 ## 141319 0.3064 ## 82261 0.2997
But Trout has also had some plays that cost him RAA.
head(trout.cf[order(trout.cf$raa.CF, decreasing=FALSE), c("description", "raa.CF")])
## description ## 104191 Nolan Reimold doubles (3) on a line drive to center fielder Mike Trout. Steve Tolleson scores. Melky Cabrera scores. ## 12698 Anthony Recker singles on a line drive to center fielder Mike Trout. Lucas Duda scores. Juan Lagares scores. ## 124545 Brandon Guyer singles on a sharp line drive to center fielder Mike Trout. Logan Forsythe scores. Desmond Jennings scores. Ben Zobrist to 2nd. ## 20179 Nick Castellanos singles on a line drive to center fielder Mike Trout. Austin Jackson scores. ## 17883 Torii Hunter singles on a soft line drive to center fielder Mike Trout. Rajai Davis scores. ## 130307 Juan Uribe doubles (18) on a line drive to center fielder Mike Trout. Matt Kemp scores. ## raa.CF ## 104191 -0.9162 ## 12698 -0.7026 ## 124545 -0.6668 ## 20179 -0.6503 ## 17883 -0.6266 ## 130307 -0.6042
Tabulating openWAR
Finally, let’s tabulate openWAR
by player. This is accomplished using the getWAR()
function.
owar = getWAR(openWARPlays.2014)
## ...Tabulating RAA per player... ## ...identified 465 replacement-level players...
The resulting data.frame
has just one row per player. We can quickly see the leaders using the generic summary()
command.
summary(owar)
## Displaying information for 1215 players, of whom 642 have pitched
## Name TPA WAR RAA repl RAA.bat RAA.br RAA.field ## 1021 Trout 558 7.506 47.25 -27.81 40.5225 2.9166 3.8100 ## 603 Kershaw 628 5.829 39.91 -18.38 4.1875 -0.9249 0.3463 ## 860 Stanton 550 5.704 34.54 -22.50 31.1416 5.3845 -1.9825 ## 439 McCutchen 497 5.467 29.17 -25.50 20.9591 1.6567 6.5585 ## 360 Tulowitzki 375 5.409 39.40 -14.69 41.6013 -5.9353 3.7317 ## 193 Hernandez, F 712 5.386 33.58 -20.29 -0.6273 0.0000 -0.6298 ## 743 Goldschmidt 484 5.347 32.44 -21.03 27.7898 3.3991 1.2491 ## 61 Utley 534 5.193 26.84 -25.09 22.4860 3.9068 0.4461 ## 283 Kluber 755 5.186 30.33 -21.53 0.2692 -0.0216 0.2474 ## 499 Gomez, C 524 5.090 24.47 -26.43 17.8057 5.9617 0.7036 ## 614 Brantley 528 5.070 31.65 -19.05 26.2959 0.6561 4.6981 ## 1211 Puig 515 5.059 28.49 -22.11 21.2540 4.5689 2.6635 ## 421 Cueto 826 5.040 26.30 -24.10 0.3697 -0.3154 1.5976 ## 136 Cano 522 5.018 25.81 -24.36 24.3708 3.7764 -2.3322 ## 853 Sale 531 4.977 34.63 -15.13 -0.3549 0.0000 0.6342 ## 90 Cabrera, M 536 4.958 27.10 -22.48 21.0360 -1.5284 7.5960 ## 231 Kinsler 559 4.786 22.15 -25.71 6.2239 8.1941 7.7363 ## 827 Mesoraco 340 4.778 34.71 -13.07 36.0769 -2.8221 1.4540 ## 378 Scherzer 717 4.742 26.99 -20.43 -0.1512 0.0000 -0.2457 ## 492 Gordon, A 497 4.729 33.94 -13.34 17.0584 8.0530 8.8322 ## 352 Lester 694 4.721 27.42 -19.79 -0.1780 0.0000 -0.4859 ## 1032 Abreu 484 4.594 25.42 -20.52 27.4917 -3.9671 1.8981 ## 351 Span 535 4.592 18.85 -27.07 5.6542 8.1080 5.0917 ## 785 Bumgarner 779 4.581 23.06 -22.75 9.3158 -0.2722 0.2088 ## 177 Jones, Ad 543 4.448 17.09 -27.39 12.2042 3.7178 1.1651 ## RAA.pitch ## 1021 0.00 ## 603 36.30 ## 860 0.00 ## 439 0.00 ## 360 0.00 ## 193 34.84 ## 743 0.00 ## 61 0.00 ## 283 29.83 ## 499 0.00 ## 614 0.00 ## 1211 0.00 ## 421 24.65 ## 136 0.00 ## 853 34.35 ## 90 0.00 ## 231 0.00 ## 827 0.00 ## 378 27.38 ## 492 0.00 ## 352 28.08 ## 1032 0.00 ## 351 0.00 ## 785 13.81 ## 177 0.00
And visualize openWAR
across all players using the generic plot()
function.
plot(owar)
This plot requires some explanation! Each blue or pink dot corresponds to a single player, who has been designated as either a replacement-level player or an MLB player based on his playing time. Our heuristic for determining who is a replacement-level player comes from the roster limits inherent in MLB. For most of the season, there are only
30 * 25
## [1] 750
roster spots, and most teams allocate those as 13 position players and 12 pitchers. Thus, we take the
30 * 13
## [1] 390
position players and 360 pitchers with the most playing time (plate appearances plus batters faced) and designate those players as MLB players. Everyone else is a replacement-level player. In the plot, the pink dots represent replacement-level players, while the blue dots represent MLB players.
Now, the RAA values of the replacement-level players are averaged in each facet of the game. This gives us a baseline for comparison. For each real player, we can now define a replacement-level shadow that provides an estimate of how many RAA a replacement-level player would have created in the same playing time instances (batting, baserunning, fielding, and pitching) as that real player. Since replacement-level players are much worse than average players, this value is negative. These are the grey dots in the plot – each pink or blue dot has a corresponding grey dot with the same horizontal coordinate. Thus, each player’s WAR is represented by the vertical distance between the player’s dot and his grey replacement-level shadow dot.
As you can see, the vertical distance between Mike Trout and his replacement-level shadow is very large, and as such he leads baseball in openWAR
in 2014. Clayton Kershaw leads in openWAR
among pitchers.
It’s interesting that Jose Molina ranks last in openWAR
. This is maybe not so surprising, since openWAR
does not track catcher framing, and we know this is a big part of Molina’s perceived value.
Recent Comments