I don’t know if you have noticed, but Baseball-Reference came out with a new web design. It seems to be a significant improvement. It has a clean look and works better on devices like smartphones and tablets. Anyway I think that this new design is a good reason for explaining how to easily import data from Baseball-Reference into R.
Suppose for example that we want to bring in the season-to-season pitching data for the HOF player Sandy Koufax. We locate the page by searching for Koufax.
In the Standard Pitching table, we select the “Share and more” menu.
If we choose the “Get table as CSV (for Excel)” option, we see the pitching data as text with commas separating fields:
We select the data, omitting the last two lines (which contain summary information), and copy it on the Clipboard.
In R, an attractive way to read in csv data is by use of the
read_csv() function in the
readr . This is not to be confused with the
read.csv() available in the base R package.
Essentially, after loading the package, you paste the data from the Clipboard inside the quotes.
library(readr) Koufax <- read_csv(" ")
I do this below.
Koufax <- read_csv("Year,Age,Tm,Lg,W,L,W-L%,ERA,G,GS,GF,CG,SHO,SV,IP,H,R,ER,HR,BB,IBB,SO,HBP,BK,WP,BF,ERA+,FIP,WHIP,H9,HR9,BB9,SO9,SO/W,Awards 1955,19,BRO,NL,2,2,.500,3.02,12,5,4,2,2,0,41.2,33,15,14,2,28,1,30,1,1,2,183,136,3.64,1.464,7.1,0.4,6.0,6.5,1.07, 1956,20,BRO,NL,2,4,.333,4.91,16,10,1,0,0,0,58.2,66,37,32,10,29,0,30,0,2,1,261,82,5.05,1.619,10.1,1.5,4.4,4.6,1.03, 1957,21,BRO,NL,5,4,.556,3.88,34,13,12,2,0,0,104.1,83,49,45,14,51,1,122,2,0,5,444,106,3.39,1.284,7.2,1.2,4.4,10.5,2.39, 1958,22,LAD,NL,11,11,.500,4.48,40,26,7,5,0,1,158.2,132,89,79,19,105,6,131,1,0,17,714,93,4.38,1.494,7.5,1.1,6.0,7.4,1.25, 1959,23,LAD,NL,8,6,.571,4.05,35,23,6,6,1,2,153.1,136,74,69,23,92,4,173,0,1,5,679,105,4.04,1.487,8.0,1.4,5.4,10.2,1.88, 1960,24,LAD,NL,8,13,.381,3.91,37,26,7,7,2,1,175.0,133,83,76,20,100,6,197,1,0,9,753,101,3.49,1.331,6.8,1.0,5.1,10.1,1.97, 1961,25,LAD,NL,18,13,.581,3.52,42,35,2,15,2,1,255.2,212,117,100,27,96,6,269,3,2,12,1068,122,3.00,1.205,7.5,1.0,3.4,9.5,2.80,ASMVP-18 1962,26,LAD,NL,14,7,.667,2.54,28,26,2,11,2,1,184.1,134,61,52,13,57,4,216,2,0,3,744,143,2.15,1.036,6.5,0.6,2.8,10.5,3.79,ASMVP-24 1963,27,LAD,NL,25,5,.833,1.88,40,40,0,20,11,0,311.0,214,68,65,18,58,7,306,3,1,6,1210,159,1.85,0.875,6.2,0.5,1.7,8.9,5.28,ASCYA-1MVP-1 1964,28,LAD,NL,19,5,.792,1.74,29,28,1,15,7,1,223.0,154,49,43,13,53,5,223,0,0,9,870,186,2.08,0.928,6.2,0.5,2.1,9.0,4.21,ASCYA-3MVP-17 1965,29,LAD,NL,26,8,.765,2.04,43,41,2,27,8,2,335.2,216,90,76,26,71,4,382,5,0,11,1297,160,1.93,0.855,5.8,0.7,1.9,10.2,5.38,ASCYA-1MVP-2 1966,30,LAD,NL,27,9,.750,1.73,41,41,0,27,5,0,323.0,241,74,62,19,77,4,317,0,0,7,1274,190,2.07,0.985,6.7,0.5,2.1,8.8,4.12,ASCYA-1MVP-2")
To check to see if we have reasonable data, I’ll graph Koufax’s ERA values against Age. Koufax has a unique trajectory — after struggling for a few years, he was a remarkable pitcher for five years and, due to injury, had to retire at the peak of his career.
library(ggplot2) ggplot(Koufax, aes(Age, ERA)) + geom_point() + geom_smooth()
I think this Clipboard method is an attractive method of importing data, especially for the introductory R user who wants to import sports data quickly into R.
I thought it was worth mentioning that there is a different way of importing Clipboard data on a Macintosh by use of the
pipe function with the “pbpaste” argument. Here I illustrate this with the Koufax data (assuming the Baseball-Reference csv data has been placed on the clipboard).
Koufax <- read.table(file = pipe("pbpaste"), sep = ",", header=TRUE)