I have been in the same office at Bowling Green State University for 36+ years (hard to believe). Looking through one of my file drawers, I found slides of a talk I gave on December 11, 1985 on “An Introduction to S” at a meeting of the Northwest Ohio Chapter of the American Statistical Association. Since R was based on the S system developed at Bell Labs, I thought it would be interesting to look back at what I talked about, and how things have changed in 30 years.

(By the way, this post has nothing to do with baseball, although I was giving baseball statistics talks in the 1980’s.)

### My 1985 Talk

Here is a snapshot of one of my opening slides of my talk:

- In 1985, I was working on S on a terminal (connected to a UNIX computer) in a room down the hall — this was before everyone had a personal computer in their offices.
- Before S, I had primarily used “procedure-based” or statistical packages where you would write some command asking for a particular procedure, say regression, and you’d get a lot of output. These were pre-packaged — someone had decided that these were the procedures you were interested in. I recall that I was thrilled with the interactive nature of S — one could really explore data quickly using a variety of descriptive and graphical methods.
- I also was excited about the ability to write S “macros” — this was the predecessor of R scripts — to do specific analyses.

In my 1985 talk, I said that S was modern in that it contained many new procedures such as robust regression, hat matrix regression diagnostics, two-way fits of medians (EDA stuff), star plots, and Chernoff faces. To learn more, I mentioned the famous S book “S — An Interactive Environment for Data Analysis and Graphics” by Becker and Chambers, and the online system by typing commands like help(“plot”).

### A Sample 1985 S Session

To give my audience a taste of S, I passed out a copy of a sample S session. Let’s look at the commands I used — this will show that the R is not that different from the S language 30+ years ago.

Here’s a snapshot of the first page of my handout (I recall we printed on wide paper with white and green bands.)

- I had some data on 7 measurements (horsepower, miles per gallon, engine size, curb weight, car length, rear legroom, retail price) on 19 subcompact cars from the 1985 model year. I had a file “car85” containing a matrix of these numerical measurements. I imported this data into S (using the S
`read`

function) and used the`matrix`

function to organize the data into 19 rows and 7 columns. I added names to the rows and columns of this matrix. (NOTE: It appears that data frames had not been introduced yet.) - I illustrated the use of the [] notation to extract rows and columns from this matrix (NOTE: This has not changed.)
- I illustrated some summary functions on vectors (
`mean, sd`

), and illustrated the use of`stem`

(stem-and-leaf) and`hist`

(histogram) functions to graph a single variable. (NOTE: These functions are all available in R.) - For relating two numerical variables, I constructed a scatterplot using
`plot`

, used the`identify`

function to pick out an outlying point, and added a regression line by the function`abline(reg(esize, mpg))`

. (NOTE: I still like the`identify`

function in R, and the`abline`

function works in the same way, although we now use the function`lm`

to fit a line.) - I used the
`reg`

function again to illustrate multiple regression. When one implemented`reg`

, one saw the components`coef, resid`

, etc. I inputed the covariates by means of a matrix — I did not use the model notation response ~ covariate1 + covariate2, although it may have been available. - I illustrated all-possible-subsets regression using the
`leaps`

function. - To illustrate categorical data, I used the
`cut`

function to put the car prices in two groups, and the engine sizes in two graphs. The`table`

function was used to produce a 2 x 2 contingency table, and used the`tapply`

function to find the median price of the small engine and large engine cars.

### Looking Back

It is clear that the R user and statistical computing communities need to give a lot of credit to the Bell Labs team that developed S many years ago. It provided a wonderful new interface for exploring data, gave us the ability to create scripts and document our work, and many of the R functions that we use in everyday work were developed in S.

To learn more about the beginnings of interactive computing with S, I’d recommend reading the ATT article “From S to R: 35 Years of AT&T Leadership in Statistical Computing“.