Running a R Script as a Program

I’m learning more about reproducible research and I happened to find this helpful post by Jon Zelner who describes how to use the docopt package to execute a R script as a program from the command line. Let me illustrate doing this for a baseball example.

Graphing Batting Averages

Suppose I’m interested in graphing the distribution of batting averages of all players with a minimum number of at-bats for a particular season. I write the following R script which uses 2014 data with a minimum number of at-bats to be 300.

 
require(Lahman)
require(dplyr)
require(ggplot2)
y <- 2014
m <- 300
png("myplot.png")
d <- filter(Batting, yearID==y)
S <- summarize(group_by(d, playerID),
               H=sum(H), AB=sum(AB))
ggplot(filter(S, AB>=m), aes(H/AB)) +
  geom_histogram(aes(x=H/AB, y=..density..)) + 
  geom_density(size=2, color="red") +
  ggtitle(paste(y, 'Batting Averages')) +
  theme(plot.title = element_text(size = rel(2), hjust=0.5,
                                  color = "blue"))
dev.off()

If I source this file within R, I’ll find a file “myplot.png” that contains a histogram and density plot of the batting averages.

Making the R Script a Program

Suppose I want to run this script as a program from Terminal (the command line interface for Macs) outside of R. There will be three inputs: y (season), m (minimum number of at-bats) and o (output file). I revise my R script as follows:

 
#!/usr/bin/Rscript
require(docopt)
'Usage:
bavg_normal.R [-y <season> -m <min AB> -o <output>]

Options:
-y season [default: 2014]
-m minimum number of at-bats [default: 300]
-o Output file [default: bavg.png]

]' -> doc

opts <- docopt(doc)
require(Lahman)
require(dplyr)
require(ggplot2)
png(opts$o)
d <- filter(Batting, yearID==as.numeric(opts$y))
S <- summarize(group_by(d, playerID),
               H=sum(H), AB=sum(AB))
ggplot(filter(S, AB>=as.numeric(opts$m)), aes(H/AB)) +
  geom_histogram(aes(x=H/AB, y=..density..)) + 
  geom_density(size=2, color="red") +
  ggtitle(paste(opts$y, 'Batting Averages')) +
  theme(plot.title = element_text(size = rel(2), hjust=0.5,
                                  color = "blue"))
dev.off()

What extra things did I add?

  1. I added a string variable doc that indicates the syntax of the command line including the options and default values
  2. The docopt function (from the package of the same name) will put the inputs into a list opts — each element will be character valued with the corresponding input
  3. Last, I write my function using the inputs opts$y , opts$m , and opts$o , making sure to convert them to numeric values if needed.

Running the R “Program” from the Command Line

Now I can quit R, open up Terminal and navigate to the folder that contains this script called “bavg_normal.R”. Then I type

 
chmod +x bavg_normal.R
 ./bavg_normal.R -y 1964  -m 300 -o bavg_1964.png

The first line makes the file “bavg_normal.R” executable, and then the second line runs this as a program using inputs (year 1964, 300 min AB, and output file “bag_1964”).

I look in my folder and I see the following graphics file that contains the following histogram of 1964 batting averages of players with at least 300 AB.

bavg_1964

Running R scripts as executable programs seems to be attractive, and I hope to be able to illustrate the usefulness of this feature in future posts.

NOTE: My colleague Maria commented that this is similar to the use of the BATCH mode where you run R scripts from the terminal. Perhaps each method has particular advantages for a certain application.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: