I have been in the same office at Bowling Green State University for 36+ years (hard to believe). Looking through one of my file drawers, I found slides of a talk I gave on December 11, 1985 on “An Introduction to S” at a meeting of the Northwest Ohio Chapter of the American Statistical Association. Since R was based on the S system developed at Bell Labs, I thought it would be interesting to look back at what I talked about, and how things have changed in 30 years.
(By the way, this post has nothing to do with baseball, although I was giving baseball statistics talks in the 1980’s.)
My 1985 Talk
Here is a snapshot of one of my opening slides of my talk:
- In 1985, I was working on S on a terminal (connected to a UNIX computer) in a room down the hall — this was before everyone had a personal computer in their offices.
- Before S, I had primarily used “procedure-based” or statistical packages where you would write some command asking for a particular procedure, say regression, and you’d get a lot of output. These were pre-packaged — someone had decided that these were the procedures you were interested in. I recall that I was thrilled with the interactive nature of S — one could really explore data quickly using a variety of descriptive and graphical methods.
- I also was excited about the ability to write S “macros” — this was the predecessor of R scripts — to do specific analyses.
In my 1985 talk, I said that S was modern in that it contained many new procedures such as robust regression, hat matrix regression diagnostics, two-way fits of medians (EDA stuff), star plots, and Chernoff faces. To learn more, I mentioned the famous S book “S — An Interactive Environment for Data Analysis and Graphics” by Becker and Chambers, and the online system by typing commands like help(“plot”).
A Sample 1985 S Session
To give my audience a taste of S, I passed out a copy of a sample S session. Let’s look at the commands I used — this will show that the R is not that different from the S language 30+ years ago.
Here’s a snapshot of the first page of my handout (I recall we printed on wide paper with white and green bands.)
- I had some data on 7 measurements (horsepower, miles per gallon, engine size, curb weight, car length, rear legroom, retail price) on 19 subcompact cars from the 1985 model year. I had a file “car85” containing a matrix of these numerical measurements. I imported this data into S (using the S
readfunction) and used the
matrixfunction to organize the data into 19 rows and 7 columns. I added names to the rows and columns of this matrix. (NOTE: It appears that data frames had not been introduced yet.)
- I illustrated the use of the  notation to extract rows and columns from this matrix (NOTE: This has not changed.)
- I illustrated some summary functions on vectors (
mean, sd), and illustrated the use of
hist(histogram) functions to graph a single variable. (NOTE: These functions are all available in R.)
- For relating two numerical variables, I constructed a scatterplot using
plot, used the
identifyfunction to pick out an outlying point, and added a regression line by the function
abline(reg(esize, mpg)). (NOTE: I still like the
identifyfunction in R, and the
ablinefunction works in the same way, although we now use the function
lmto fit a line.)
- I used the
regfunction again to illustrate multiple regression. When one implemented
reg, one saw the components
coef, resid, etc. I inputed the covariates by means of a matrix — I did not use the model notation response ~ covariate1 + covariate2, although it may have been available.
- I illustrated all-possible-subsets regression using the
- To illustrate categorical data, I used the
cutfunction to put the car prices in two groups, and the engine sizes in two graphs. The
tablefunction was used to produce a 2 x 2 contingency table, and used the
tapplyfunction to find the median price of the small engine and large engine cars.
It is clear that the R user and statistical computing communities need to give a lot of credit to the Bell Labs team that developed S many years ago. It provided a wonderful new interface for exploring data, gave us the ability to create scripts and document our work, and many of the R functions that we use in everyday work were developed in S.
To learn more about the beginnings of interactive computing with S, I’d recommend reading the ATT article “From S to R: 35 Years of AT&T Leadership in Statistical Computing“.