In the last post, I mentioned writing a R package to implement my component method of predicting final season batting averages. Since the process of building a package is currently pretty easy using the RStudio interface, I thought it was worthwhile to describe the process in this post. It is much simpler than the old command-line method I used to use to build packages. Hopefully this will encourage a reader to write his/her own packages.
Why Build a R Package?
I will list several good reasons to write packages.
- Communicate your R work to others. For example, it allows others to try out your statistical methods that you have developed.
- Help organize your own R work. For example, it forces you to write documentation for each function and each dataset you create. Personally, since I write so much code, I appreciate the use of packages to collect and organize my work.
- Allow others to use interesting datasets. For example, my package tsub contains all of the datasets that I use in my Teaching Statistics Using Baseball text.
The Process Using RStudio
- Collect Your Functions. I have two functions
collect_hitting_data
, andcomponent_predict
that I want to put in my package. Obviously, I have checked these functions to make sure they work. (Actually I have more functions in this package, but I thought two functions could be used to quickly illustrate the process.) - Open a New Project. In RStudio, I select “New Project” from the File menu and then indicate on the next screen that I want to start a project in a New Directory.
- In the New Project Window, I indicate I want to create a new R package.
- In the Create R Package window, I decide to call my new package “BApredict3” and in the “Create package based on source files” area, I add my two R functions. I also indicate where I want my package files to be located on my computer.
After I press the Create Package button, I navigate to the directory location specified above. I see that RStudio has created the package file structure in the folder BApredict3. In particular, I see that my two functions are placed in the R subfolder.
- Add the documentation. In the man folder, I want to create two files
collect_hitting_data.Rd
andcomponent_predict.Rd
that provide the documentation for my functions. Usually I manually create these files following some examples. But Hadley Wickham has a packageroxygen2
that will automatically create these files from comments (of a special form) that you put in your source functions. Here I have done that and I create the documentation files by use of theroxygenise
function. (I don’t have much experience withroxygen2
, but at least it gives you templates for these documentation files that you can edit.)
- Edit the DESCRIPTION file. This file contains basic information about your package. I edit the template that is created. It is important to indicate which packages will be needed for my functions on the Depends line.
- Build the package. Now I am ready to build the package. This is done by pressing the “Build & Reload” button under the Build tab in RStudio.
- Try out the package. I load the package and try out each of my new functions. By checking the values in the objects d and out, it seems to work.
- Publish your package. When I say “publish”, I mean that you’d like to allow other people to use your package. There are several ways of doing this. I have submitted packages on CRAN, but I only do this for polished packages that I don’t plan on editing much in the near future. More recently I have been hosting packages on my github site. This is especially helpful when you are developing your package, adding and editing functions. (Here is a recent post from RStudio about version control of R packages using github.)
Want to Learn More?
The best way to learn about building R packages is to try the RStudio interface to build packages for your own functions. I think we’ll be talking more about this building package process in our beginning data science courses. For more information about building packages using RStudio, look at the RStudio documentation here.