In the last post, I mentioned writing a R package to implement my component method of predicting final season batting averages. Since the process of building a package is currently pretty easy using the RStudio interface, I thought it was worthwhile to describe the process in this post. It is much simpler than the old command-line method I used to use to build packages. Hopefully this will encourage a reader to write his/her own packages.
Why Build a R Package?
I will list several good reasons to write packages.
- Communicate your R work to others. For example, it allows others to try out your statistical methods that you have developed.
- Help organize your own R work. For example, it forces you to write documentation for each function and each dataset you create. Personally, since I write so much code, I appreciate the use of packages to collect and organize my work.
- Allow others to use interesting datasets. For example, my package tsub contains all of the datasets that I use in my Teaching Statistics Using Baseball text.
The Process Using RStudio
- Collect Your Functions. I have two functions
component_predictthat I want to put in my package. Obviously, I have checked these functions to make sure they work. (Actually I have more functions in this package, but I thought two functions could be used to quickly illustrate the process.)
- Open a New Project. In RStudio, I select “New Project” from the File menu and then indicate on the next screen that I want to start a project in a New Directory.
- In the New Project Window, I indicate I want to create a new R package.
- In the Create R Package window, I decide to call my new package “BApredict3” and in the “Create package based on source files” area, I add my two R functions. I also indicate where I want my package files to be located on my computer.
After I press the Create Package button, I navigate to the directory location specified above. I see that RStudio has created the package file structure in the folder BApredict3. In particular, I see that my two functions are placed in the R subfolder.
- Add the documentation. In the man folder, I want to create two files
component_predict.Rdthat provide the documentation for my functions. Usually I manually create these files following some examples. But Hadley Wickham has a package
roxygen2that will automatically create these files from comments (of a special form) that you put in your source functions. Here I have done that and I create the documentation files by use of the
roxygenisefunction. (I don’t have much experience with
roxygen2, but at least it gives you templates for these documentation files that you can edit.)
- Edit the DESCRIPTION file. This file contains basic information about your package. I edit the template that is created. It is important to indicate which packages will be needed for my functions on the Depends line.
- Build the package. Now I am ready to build the package. This is done by pressing the “Build & Reload” button under the Build tab in RStudio.
- Try out the package. I load the package and try out each of my new functions. By checking the values in the objects d and out, it seems to work.
- Publish your package. When I say “publish”, I mean that you’d like to allow other people to use your package. There are several ways of doing this. I have submitted packages on CRAN, but I only do this for polished packages that I don’t plan on editing much in the near future. More recently I have been hosting packages on my github site. This is especially helpful when you are developing your package, adding and editing functions. (Here is a recent post from RStudio about version control of R packages using github.)
Want to Learn More?
The best way to learn about building R packages is to try the RStudio interface to build packages for your own functions. I think we’ll be talking more about this building package process in our beginning data science courses. For more information about building packages using RStudio, look at the RStudio documentation here.