I’ve been writing posts on this site for awhile and you know my style. I start with a baseball question and try to answer the question using R work and modeling and associated graphical displays. I usually provide some R code which helps some people do their own explorations. But I suspect that my posts don’t really help the interested new reader who wishes to learn R. So I thought it would be helpful to have a series of tutorial posts called “Welcome to R” introducing R and RStudio directed to the person who wants to get started.
I am not using tidyverse
That raises the obvious question — how to get started in R? I’ve taught R for many years, first for my graduate courses in statistics and more recently for undergraduate courses in data science. I started coding with S many years ago, I’ve written or cowritten three books that introduce R for different audiences, and so I have done a bit of thinking of how best to introduce R from a pedagogical perspective.
One popular general approach for learning R is based on the tidyverse suite of R packages. As described by its main developer Hadley Wickham, tidyverse is an opinionated collection of R packages designed for data science. I have used tidyverse packages, especially dplyr, in my data science teaching in the past. In fact, I have used the R for Data Science text by Wickham and Grolemund to introduce R for the first data science course at BGSU. The tidyverse collection is embraced by RStudio and many other people as a great way of learning R.
But I don’t plan on using tidyverse in this introduction to R.
Why? I found at BGSU that my data science students who learned tidyverse had gaps in their knowledge of R. After taking my tidyverse course, they didn’t know how to write a loop or write simple functions and they got frustrated when their tidyverse code didn’t work. (They didn’t have a backup plan.) They really didn’t understand vectors and matrices, although those objects are building blocks in R.
One difficulty in learning any new programming language is the syntax especially when there are many new function names to learn. Base R is hard enough, but tidyverse adds hundreds of new functions to learn. By the way, there are other tidyverse skeptics out there — for example, Norm Matloff and Jasper McChesney — and Matloff is an experienced teacher and writer of the R language.
That raises the question — is tidyverse really a good way of learning R? Honestly, I have my doubts. It would be interesting to design an experiment to assess the comparative value of tidyverse and base R functions in learning R. Statistics educators think about the best way to communicate statistical inference — why shouldn’t data science educators think about best ways of teaching coding for data science?
My Approach in Welcome to R
I plan on introducing R, focusing not on tidyverse, but rather on the exploration and graphing functions available in base R. Here are some things that I’ll discuss in this tutorial:
- Installing R and RStudio. Getting familiar with the RStudio environment.
- Data types and containers (such as vectors) for the data.
- Importing a csv file downloaded from FanGraphs.
- Working with data frames.
- Summarizing and graphing character and numerical data.
- Exploring relationships.
- Getting started with ggplot2 graphing (by the way, ggplot2 is not part of the tidyverse suite of packages)
Some other comments:
- I’ll pose some questions for you to try on your own.
- All of the material will be posted as Markdown files.
- Last, this will be fun since I will be talking about all of the R topics in the context baseball.
One approach to learning R is to just dive in. For example, you might get hold of our Analyzing Baseball Data with R book and jump to a particular chapter of interest like runs expectancy and try running code. That is not a good approach for most people. In this Welcome to R tutorial, I am not going to dive into some sophisticated analysis. I will assume little previous knowledge of programming and the goal is to gain familiarity with some basic R functions so you can start to do interesting baseball work.
I’ll start Welcome to R tomorrow with one or two posts a week.