the retrosheet Package

A while back, Jim Albert had a post on downloading Retrosheet data into R. However, R now has a package to do this all on its own, called retrosheet. Today, I’m going to show a few of the ways to grab some data using this package. Hopefully, Jim will add onto this next week with some more sophisticated exhibitions. Let’s begin by loading up the package, and pop open a help file for the function getRetrosheet getRetrosheet.

###load up package and get some help
library(retrosheet)
help(getRetrosheet)

The first thing we might want to do is load up the 2014 schedule (2015 is not up yet) and take a look at it. First, we’ll use the function to note that we want to grab a schedule, then note the year. One thing worth doing is checking that the schedule is a data frame, which I show below. This way, we know how we want to treat it once it’s loaded up into R. Not all packages necessarily grab data as a data frame, so it’s good to check this. Further, make sure to assign the data download a name, as it is a relatively large data frame. Without assignment to a name like “sched14” you’ll end up with a big mess of a screen in R, just like you would when printing the entirety of any large data frame.

###look at 2014 schedule
sched14 <- getRetrosheet("schedule", 2014)
is.data.frame(sched14)
[1] TRUE
head(sched14)
      Date GameNo Day VisTeam VisLg VisGmNo HmTeam HmLg HmGmNo TimeOfDay Postponed Makeup
1 20140322      0 Sat     LAN    NL       1    ARI   NL      1         n        NA     NA
2 20140323      0 Sun     LAN    NL       2    ARI   NL      2         d        NA     NA
3 20140330      0 Sun     LAN    NL       3    SDN   NL      1         n        NA     NA
4 20140331      0 Mon     SFN    NL       1    ARI   NL      3         n        NA     NA
5 20140331      0 Mon     BOS    AL       1    BAL   AL      1         d        NA     NA
6 20140331      0 Mon     MIN    AL       1    CHA   AL      1         d        NA     NA

So we can see here that the data are pretty simple, but there is useful information throughout. It includes the day of week, game number for each of the teams, who the home and visitor are, which league they play in, and whether it was a day or night game. Also, we know if there was a postponement or makeup game. However, we don’t have game results here. Maybe that’s something we would want, so let’s instead download the entire game files for 2014.

###look at 2014 games
game14 <- getRetrosheet("game", 2014)
is.data.frame(game14)
[1] TRUE
head(game14)
      Date DblHdr Day VisTm VisTmLg VisTmGNum HmTm HmTmLg HmTmGNum VisRuns HmRuns NumOuts DayNight Completion Forfeit Protest ParkID Attendance
1 20140322      0 Sat   LAN      NL         1  ARI     NL        1       3      1      54        N                 NA          SYD01      38266
2 20140323      0 Sun   LAN      NL         2  ARI     NL        2       7      5      54        D                 NA          SYD01      38079
3 20140330      0 Sun   LAN      NL         3  SDN     NL        1       1      3      51        N                 NA          SAN02      45567
4 20140331      0 Mon   SEA      AL         1  ANA     AL        1      10      3      54        N                 NA          ANA01      44152
5 20140331      0 Mon   BOS      AL         1  BAL     AL        1       1      2      51        D                 NA          BAL12      46685
6 20140331      0 Mon   MIN      AL         1  CHA     AL        1       3      5      51        D                 NA          CHI12      37422
  Duration   VisLine    HmLine VisAB VisH VisD VisT VisHR VisRBI VisSH VisSF VisHBP VisBB VisIBB VisK VisSB VisCS VisGDP VisCI VisLOB VisPs VisER
1      169  10200000 000001000    33    5    2    0     1      3     0     0      1     3      0   11     0     0      0     0      7     4     1
2      241 102021100 000000014    34   13    3    0     0      6     1     2      2     8      0    7     1     0      1     0     13     8     5
3      169     10000 00000003x    31    4    0    0     0      1     0     0      0     3      0    9     0     0      0     0      6     4     2
4      197  10001206 201000000    36   11    4    2     1     10     0     1      0     8      1   11     1     1      0     0      8     5     2
5      173    100000 01000010x    36    9    2    0     1      1     0     0      1     3      0    6     0     0      0     0     12     2     2
6      155   2000010 02200100x    32    7    3    0     0      3     1     0      0     1      0   10     0     0      1     0      4     4     5
  VisTER VisWP VisBalks VisPO VisA VisE VisPassed VisDB VisTP HmAB HmH HmD HmT HmHR HmRBI HmSH HmSF HmHBP HmBB HmIBB HmK HmSB HmCS HmGDP HmCI HmLOB
1      1     1        0    27   13    1         0     0     0   33   5   1   0    0     1    0    0     0    2     0  10    0    0     0    0     7
2      5     0        0    27    4    1         0     2     0   35   8   0   0    1     5    0    0     0    8     0   8    0    0     2    0    11
3      2     0        0    24   12    2         0     2     0   27   5   0   0    1     3    2    0     0    4     0  10    1    0     2    0     6
4      2     2        0    27    5    1         0     0     0   34   6   1   0    1     3    0    0     1    1     0  13    1    0     0    0     6
5      2     0        0    24   11    0         0     2     0   28   6   0   0    1     1    0    0     0    1     0   9    0    0     2    0     3
6      5     1        0    24   10    0         0     2     0   31  11   2   0    2     5    0    1     0    2     0   6    0    0     2    0     5
  HmPs HmER HmTER HmWP HmBalks HmPO HmA HmE HmPass HmDB HmTP   UmpHID           UmpHNm  Ump1BID        Ump1BNm  Ump2BID      Ump2BNm  Ump3BID
1    5    3     3    1       0   27  10   1      0    0    0 welkt901        Tim Welke scotd901     Dale Scott diazl901     Laz Diaz carlm901
2    6    6     6    1       0   27  15   3      0    1    0 scotd901       Dale Scott diazl901       Laz Diaz carlm901 Mark Carlson welkt901
3    5    1     1    1       0   27  10   0      0    0    0 culbf901 Fieldin Culbreth gonzm901 Manny Gonzalez reynj901 Jim Reynolds barbs901
4    5    9     9    0       0   27   7   1      0    0    0 westj901         Joe West fostm901   Marty Foster drakr901    Rob Drake porta901
5    5    1     1    0       0   27  13   0      0    0    0 demud901      Dana DeMuth kulpr901      Ron Kulpa hicke901    Ed Hickox barrl901
6    4    3     3    0       0   27  11   0      0    1    0 scotd901       Dale Scott iassd901   Dan Iassogna buckc901   CB Bucknor gibsh902
        Ump3BNm UmpLFID UmpLFNm UmpRFID UmpRFNm VisMgrID        VisMgrNm  HmMgrID        HmMgrNm   WinPID          WinPNm      PID         PNAme
1  Mark Carlson      NA  (none)      NA  (none) mattd001   Don Mattingly gibsk001    Kirk Gibson kersc001 Clayton Kershaw milew001    Wade Miley
2     Tim Welke      NA  (none)      NA  (none) mattd001   Don Mattingly gibsk001    Kirk Gibson ryu-h001    Hyun-Jin Ryu cahit001 Trevor Cahill
3   Sean Barber      NA  (none)      NA  (none) mattd001   Don Mattingly blacb001    Buddy Black thayd001     Dale Thayer wilsb001  Brian Wilson
4   Alan Porter      NA  (none)      NA  (none) mccll001 Lloyd McClendon sciom001  Mike Scioscia hernf002 Felix Hernandez weavj003  Jered Weaver
5 Lance Barrett      NA  (none)      NA  (none) farrj001    John Farrell showb801 Buck Showalter britz001    Zach Britton lestj001    Jon Lester
6  Tripp Gibson      NA  (none)      NA  (none) gardr001  Ron Gardenhire ventr001  Robin Ventura salec001      Chris Sale nolar001 Ricky Nolasco
   SavePID        SavePNm GWinRBIID       GWinRBINm VisStPchID      VisStPchNm HmStPchID      HmStPchNm VisBat1ID       VisBat1Nm VisBat1Pos VisBat2ID
1 jansk001  Kenley Jansen  ethia001    Andre Ethier   kersc001 Clayton Kershaw  milew001     Wade Miley  puigy001     Yasiel Puig          9  turnj001
2                  (none)  ethia001    Andre Ethier   ryu-h001    Hyun-Jin Ryu  cahit001  Trevor Cahill  gordd002      Dee Gordon          4  puigy001
3 streh001  Huston Street  denoc001  Chris Denorfia   ryu-h001    Hyun-Jin Ryu  casha001 Andrew Cashner  crawc002   Carl Crawford          7  puigy001
4                  (none)  almoa001 Abraham Almonte   hernf002 Felix Hernandez  weavj003   Jered Weaver  almoa001 Abraham Almonte          8  millb002
5 huntt002   Tommy Hunter  cruzn002     Nelson Cruz   lestj001      Jon Lester  tillc001  Chris Tillman  navad002     Daniel Nava          9  pedrd001
6 lindm001 Matt Lindstrom  abrej003      Jose Abreu   nolar001   Ricky Nolasco  salec001     Chris Sale  dozib001    Brian Dozier          4  suzuk001
       VisBat2Nm VisBat2Pos VisBat3ID      VisBat3Nm VisBat3Pos VisBat4ID       VisBat4Nm VisBat4Pos VisBat5ID       VisBat5Nm VisBat5Pos VisBat6ID
1  Justin Turner          4  ramih003 Hanley Ramirez          6  gonza003 Adrian Gonzalez          3  vanss001 Scott Van Slyke          7  uribj002
2    Yasiel Puig          9  ramih003 Hanley Ramirez          6  gonza003 Adrian Gonzalez          3  ethia001    Andre Ethier          8  ellia001
3    Yasiel Puig          9  ramih003 Hanley Ramirez          6  gonza003 Adrian Gonzalez          3  ethia001    Andre Ethier          8  uribj002
4    Brad Miller          6  canor001  Robinson Cano          4  smoaj001    Justin Smoak          3  morrl001  Logan Morrison         10  seagk001
5 Dustin Pedroia          4  ortid001    David Ortiz         10  napom001     Mike Napoli          3  carpm001       Mike Carp          7  sizeg001
6    Kurt Suzuki          2  mauej001      Joe Mauer          3  willj004 Josh Willingham          7  colac001 Chris Colabello         10  plout001
       VisBat6Nm VisBat6Pos VisBat7ID        VisBat7Nm VisBat7Pos VisBat8ID       VisBat8Nm VisBat8Pos VisBat9ID         VisBat9Nm VisBat9Pos HmBat1ID
1     Juan Uribe          5  ethia001     Andre Ethier          8  ellia001      A.J. Ellis          2  kersc001   Clayton Kershaw          1 polla001
2     A.J. Ellis          2  baxtm001      Mike Baxter          7  uribj002      Juan Uribe          5  ryu-h001      Hyun-Jin Ryu          1 polla001
3     Juan Uribe          5  ellia001       A.J. Ellis          2  gordd002      Dee Gordon          4  ryu-h001      Hyun-Jin Ryu          1 cabre001
4    Kyle Seager          5  saunm001 Michael Saunders          9  ackld001   Dustin Ackley          7  zunim001       Mike Zunino          2 calhk001
5 Grady Sizemore          8  bogax001  Xander Bogaerts          6  piera001 A.J. Pierzynski          2  middw001 Will Middlebrooks          5 markn001
6 Trevor Plouffe          5  arcio001    Oswaldo Arcia          9  hicka001     Aaron Hicks          8  florp001    Pedro Florimon          6 eatoa002
        HmBat1Nm HmBat1Pos HmBat2ID       HmBat2Nm HmBat2Pos HmBat3ID         HmBat3Nm HmBat3Pos HmBat4ID      HmBat4Nm HmBat4Pos HmBat5ID
1   A.J. Pollock         8 hilla001     Aaron Hill         4 goldp001 Paul Goldschmidt         3 pradm001  Martin Prado         5 trumm001
2   A.J. Pollock         8 hilla001     Aaron Hill         4 goldp001 Paul Goldschmidt         3 pradm001  Martin Prado         5 montm001
3 Everth Cabrera         6 denoc001 Chris Denorfia         9 headc001    Chase Headley         5 gyorj001   Jedd Gyorko         4 alony001
4   Kole Calhoun         9 troum001     Mike Trout         8 pujoa001    Albert Pujols         3 hamij003 Josh Hamilton         7 freed001
5  Nick Markakis         9 hardj003     J.J. Hardy         6 jonea003       Adam Jones         8 davic003   Chris Davis         3 cruzn002
6     Adam Eaton         8 semim001  Marcus Semien         4 gillc001  Conor Gillaspie         5 abrej003    Jose Abreu         3 dunna001
        HmBat5Nm HmBat5Pos HmBat6ID       HmBat6Nm HmBat6Pos HmBat7ID         HmBat7Nm HmBat7Pos HmBat8ID       HmBat8Nm HmBat8Pos HmBat9ID
1    Mark Trumbo         7 montm001 Miguel Montero         2 owinc001     Chris Owings         6 parrg001  Gerardo Parra         9 milew001
2 Miguel Montero         2 trumm001    Mark Trumbo         7 parrg001    Gerardo Parra         9 gregd001 Didi Gregorius         6 cahit001
3  Yonder Alonso         3 medit001   Tommy Medica         7 venaw001     Will Venable         8 river003    Rene Rivera         2 casha001
4   David Freese         5 ibanr001    Raul Ibanez        10 kendh001   Howie Kendrick         4 iannc001 Chris Iannetta         2 aybae001
5    Nelson Cruz         7 wietm001   Matt Wieters         2 yound003     Delmon Young        10 flahr001  Ryan Flaherty         5 schoj001
6      Adam Dunn        10 garca003 Avisail Garcia         9 deaza001 Alejandro de Aza         7 ramia003 Alexei Ramirez         6 flowt001
         HmBat9Nm HmBat9Pos Additional Acquisition
1      Wade Miley         1                      Y
2   Trevor Cahill         1                      Y
3  Andrew Cashner         1                      Y
4     Erick Aybar         6                      Y
5 Jonathan Schoop         4                      Y
6   Tyler Flowers         2                      Y

Here, you can see that R is directly grabbing the data from the Retrosheet website and putting it in a usable format for you in R. Handy little package we have here. However, we might not want all of this information. Perhaps we just want the first row of variables above (Date through Attendance). Let’s go ahead and select only that portion of the data, and then also subset the data to only Kansas City Royals home games.

###subset the data
game14 <- game14[,1:18]
game14 <- subset(game14, game14$HmTm=="KCA")
head(game14)
        Date DblHdr Day VisTm VisTmLg VisTmGNum HmTm HmTmLg HmTmGNum VisRuns HmRuns NumOuts DayNight Completion Forfeit Protest ParkID Attendance
55  20140404      0 Fri   CHA      AL         4  KCA     AL        3       5      7      51        D                 NA          KAN06      40103
69  20140405      0 Sat   CHA      AL         5  KCA     AL        4       3      4      51        D                 NA          KAN06      21463
84  20140406      0 Sun   CHA      AL         6  KCA     AL        5       5      1      54        D                 NA          KAN06      29760
97  20140407      0 Mon   TBA      AL         8  KCA     AL        6       2      4      51        N                 NA          KAN06      12087
104 20140408      0 Tue   TBA      AL         9  KCA     AL        7       1      0      54        N                 NA          KAN06      13905
119 20140409      0 Wed   TBA      AL        10  KCA     AL        8       3      7      51        D                 NA          KAN06      13612

Ok. Now, let’s finish up by looking at attendance levels for each home game throughout the season and see if fans started catching on at some point that the Royals were for real. Note that we’ll have to rework the game number to be a “home game number” for our x-axis to work out correctly.

###plot Royals attendance over time
game14$homeGame <- 1:81

png(file="Royals2014Att.png", height=550, width=650)
game14 <- game14[order(game14$HmTm, game14$Date),]
plot(game14$Attendance ~ game14$homeGame, type="n", main="2014 Attendance by Team", xlab="Home Game Number", ylab="Attendance")
rect(-1000, -1000, 100000, 100000, col="#00000010")
grid(lty="dashed")
lines(game14$Attendance ~ game14$homeGame, main="2014 Attendance by Team", xlab="Game Number", ylab="Attendance", lwd=3)
dev.off()

Royals2014Att

And here we can see that attendance slowly crept upward after the 15th or so home game. Interestingly, it began creeping downward again after about home game 50, with some major variation and possibly an upward spike as the season ended.

This is, of course, only a small sample of what can be downloaded and done with the retrosheet package, and I encourage you to explore a bit more with it. I suspect I’ll be doing the same. The ease with which the data can be loaded into R in a neat and organized way is really nice, and something I wish I had years ago when I began my dissertation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: