Monthly Archives: August, 2014

openWAR in 2014

Over at Stats In the Wild, my collaborator Greg Matthews has been monitoring the results of openWAR for the current season. How is he doing this?

The first step, of course, is to load the openWAR package.

require(openWAR)

Getting the 2014 Data

Now we need some data. While data from the previous two years is bundled into the openWAR package, data from the current season is not – so we’ll have to download it. We have made this as painless as possible. All we have to do is tell the getData() function the time interval over which we want to download game data, and it will do the rest. In this case, we want all games from the season opener played by the Dodgers and Diamondbacks in Australia on March 22nd, through today’s games.

Warning: this will take a while to run – possibly an hour or so.

Since we invested so much time in downloading this data, let’s save it to disk so that we don’t have to download it again. Note that the resulting object is just a data.frame, so if we download more data tomorrow, we can just rbind() the new data to the old.

MLBAM2014 = getData(start = "2014-03-22", end = "2014-08-22")
save(MLBAM2014, file = "MLBAM2014.rda")

Note the venue for the first game!

MLBAM2014[1, "stadium"]
## [1] Sydney Cricket Ground
## 58 Levels: Maryvale Baseball Park ... Safeco Field

This data.frame contains nearly 150,000 rows, and takes up a decent bit of memory, so you may experience some sluggishness depending on your machine.

dim(MLBAM2014)
## [1] 144739     62
print(object.size(MLBAM2014), units = "Mb")
## 99.1 Mb

Computing openWAR

The computation of openWAR involves fitting 26 different models to the data, so we’ll have to do this for our 2014 data. We have rolled these all into one function called makeWAR.

Again, this may take a fair amount of time and memory depending on your machine. We’re hoping to optimize this process in the future.

Before we do anything else, we should save these results to disk so we won’t lose them.

ds = makeWAR(MLBAM2014)
openWARPlays.2014 = ds$openWAR
save(openWARPlays.2014, file = "openWARPlays.2014.rda")

Now we have a object of class openWARPlays. It’s a data.frame that contains a whole bunch of other information that results from our openWAR calculations. Each row in this data.frame corresponds exactly to one row of play-by-play data in MLBAM2014.

dim(openWARPlays.2014)
## [1] 144739     37

We can look at individual plays if we like. For example, in which play from 2014 did Mike Trout earn his most RAA as a CF? First, we need to grab Mike Trout’s MLBAM ID. You can do this by looking at the URL in his player page on mlb.com, or you can just use grep().

head(MLBAM2014[grepl("Trout", MLBAM2014$batterName), c("batterName", "batterId")])
##       batterName batterId
## 2601       Trout   545361
## 19119      Trout   545361
## 35119      Trout   545361
## 53122      Trout   545361
## 2701       Trout   545361
## 17132      Trout   545361

Now let’s rank the plays by RAA accrued to the CF when Trout was playing CF.

trout.cf = subset(cbind(description = MLBAM2014$description, openWARPlays.2014), playerId.CF == 545361)
trout.cf.idx = order(trout.cf$raa.CF, decreasing=TRUE)
head(trout.cf[trout.cf.idx, c("description", "raa.CF")])
##                                                                              description
## 20164  Nick Castellanos flies out to center fielder Mike Trout.   Torii Hunter to 3rd.
## 98908                              Jose Altuve lines out to center fielder Mike Trout.
## 95793        Paul Konerko flies out to center fielder Mike Trout.   Jose Abreu to 3rd.
## 19100                           Miguel Cabrera flies out to center fielder Mike Trout.
## 141319                          Dustin Pedroia flies out to center fielder Mike Trout.
## 82261                             Nick Swisher flies out to center fielder Mike Trout.
##        raa.CF
## 20164  0.3535
## 98908  0.3110
## 95793  0.3106
## 19100  0.3089
## 141319 0.3064
## 82261  0.2997

But Trout has also had some plays that cost him RAA.

head(trout.cf[order(trout.cf$raa.CF, decreasing=FALSE), c("description", "raa.CF")])
##                                                                                                                                                    description
## 104191                             Nolan Reimold doubles (3) on a line drive to center fielder Mike Trout.   Steve Tolleson scores.    Melky Cabrera scores.
## 12698                                      Anthony Recker singles on a line drive to center fielder Mike Trout.   Lucas Duda scores.    Juan Lagares scores.
## 124545 Brandon Guyer singles on a sharp line drive to center fielder Mike Trout.   Logan Forsythe scores.    Desmond Jennings scores.    Ben Zobrist to 2nd.
## 20179                                                        Nick Castellanos singles on a line drive to center fielder Mike Trout.   Austin Jackson scores.
## 17883                                                          Torii Hunter singles on a soft line drive to center fielder Mike Trout.   Rajai Davis scores.
## 130307                                                             Juan Uribe doubles (18) on a line drive to center fielder Mike Trout.   Matt Kemp scores.
##         raa.CF
## 104191 -0.9162
## 12698  -0.7026
## 124545 -0.6668
## 20179  -0.6503
## 17883  -0.6266
## 130307 -0.6042

Tabulating openWAR

Finally, let’s tabulate openWAR by player. This is accomplished using the getWAR() function.

owar = getWAR(openWARPlays.2014)
## ...Tabulating RAA per player...
## ...identified 465 replacement-level players...

The resulting data.frame has just one row per player. We can quickly see the leaders using the generic summary() command.

summary(owar)
## Displaying information for 1215 players, of whom 642 have pitched
##              Name TPA   WAR   RAA   repl RAA.bat  RAA.br RAA.field
## 1021        Trout 558 7.506 47.25 -27.81 40.5225  2.9166    3.8100
## 603       Kershaw 628 5.829 39.91 -18.38  4.1875 -0.9249    0.3463
## 860       Stanton 550 5.704 34.54 -22.50 31.1416  5.3845   -1.9825
## 439     McCutchen 497 5.467 29.17 -25.50 20.9591  1.6567    6.5585
## 360    Tulowitzki 375 5.409 39.40 -14.69 41.6013 -5.9353    3.7317
## 193  Hernandez, F 712 5.386 33.58 -20.29 -0.6273  0.0000   -0.6298
## 743   Goldschmidt 484 5.347 32.44 -21.03 27.7898  3.3991    1.2491
## 61          Utley 534 5.193 26.84 -25.09 22.4860  3.9068    0.4461
## 283        Kluber 755 5.186 30.33 -21.53  0.2692 -0.0216    0.2474
## 499      Gomez, C 524 5.090 24.47 -26.43 17.8057  5.9617    0.7036
## 614      Brantley 528 5.070 31.65 -19.05 26.2959  0.6561    4.6981
## 1211         Puig 515 5.059 28.49 -22.11 21.2540  4.5689    2.6635
## 421         Cueto 826 5.040 26.30 -24.10  0.3697 -0.3154    1.5976
## 136          Cano 522 5.018 25.81 -24.36 24.3708  3.7764   -2.3322
## 853          Sale 531 4.977 34.63 -15.13 -0.3549  0.0000    0.6342
## 90     Cabrera, M 536 4.958 27.10 -22.48 21.0360 -1.5284    7.5960
## 231       Kinsler 559 4.786 22.15 -25.71  6.2239  8.1941    7.7363
## 827      Mesoraco 340 4.778 34.71 -13.07 36.0769 -2.8221    1.4540
## 378      Scherzer 717 4.742 26.99 -20.43 -0.1512  0.0000   -0.2457
## 492     Gordon, A 497 4.729 33.94 -13.34 17.0584  8.0530    8.8322
## 352        Lester 694 4.721 27.42 -19.79 -0.1780  0.0000   -0.4859
## 1032        Abreu 484 4.594 25.42 -20.52 27.4917 -3.9671    1.8981
## 351          Span 535 4.592 18.85 -27.07  5.6542  8.1080    5.0917
## 785     Bumgarner 779 4.581 23.06 -22.75  9.3158 -0.2722    0.2088
## 177     Jones, Ad 543 4.448 17.09 -27.39 12.2042  3.7178    1.1651
##      RAA.pitch
## 1021      0.00
## 603      36.30
## 860       0.00
## 439       0.00
## 360       0.00
## 193      34.84
## 743       0.00
## 61        0.00
## 283      29.83
## 499       0.00
## 614       0.00
## 1211      0.00
## 421      24.65
## 136       0.00
## 853      34.35
## 90        0.00
## 231       0.00
## 827       0.00
## 378      27.38
## 492       0.00
## 352      28.08
## 1032      0.00
## 351       0.00
## 785      13.81
## 177       0.00

And visualize openWAR across all players using the generic plot() function.

plot(owar)

unnamed-chunk-14
 

This plot requires some explanation! Each blue or pink dot corresponds to a single player, who has been designated as either a replacement-level player or an MLB player based on his playing time. Our heuristic for determining who is a replacement-level player comes from the roster limits inherent in MLB. For most of the season, there are only

30 * 25
## [1] 750

roster spots, and most teams allocate those as 13 position players and 12 pitchers. Thus, we take the

30 * 13
## [1] 390

position players and 360 pitchers with the most playing time (plate appearances plus batters faced) and designate those players as MLB players. Everyone else is a replacement-level player. In the plot, the pink dots represent replacement-level players, while the blue dots represent MLB players.

Now, the RAA values of the replacement-level players are averaged in each facet of the game. This gives us a baseline for comparison. For each real player, we can now define a replacement-level shadow that provides an estimate of how many RAA a replacement-level player would have created in the same playing time instances (batting, baserunning, fielding, and pitching) as that real player. Since replacement-level players are much worse than average players, this value is negative. These are the grey dots in the plot – each pink or blue dot has a corresponding grey dot with the same horizontal coordinate. Thus, each player’s WAR is represented by the vertical distance between the player’s dot and his grey replacement-level shadow dot.

As you can see, the vertical distance between Mike Trout and his replacement-level shadow is very large, and as such he leads baseball in openWAR in 2014. Clayton Kershaw leads in openWAR among pitchers.

It’s interesting that Jose Molina ranks last in openWAR. This is maybe not so surprising, since openWAR does not track catcher framing, and we know this is a big part of Molina’s perceived value.

Advertisements