Interactive Statistics Computing 30 Years Ago

I have been in the same office at Bowling Green State University for 36+ years (hard to believe). Looking through one of my file drawers, I found slides of a talk I gave on December 11, 1985 on “An Introduction to S” at a meeting of the Northwest Ohio Chapter of the American Statistical Association. Since R was based on the S system developed at Bell Labs, I thought it would be interesting to look back at what I talked about, and how things have changed in 30 years.

(By the way, this post has nothing to do with baseball, although I was giving baseball statistics talks in the 1980’s.)

My 1985 Talk

Here is a snapshot of one of my opening slides of my talk:

stalk1

  • In 1985, I was working on S on a terminal (connected to a UNIX computer) in a room down the hall — this was before everyone had a personal computer in their offices.
  • Before S, I had primarily used “procedure-based” or statistical packages where you would write some command asking for a particular procedure, say regression, and you’d get a lot of output. These were pre-packaged — someone had decided that these were the procedures you were interested in. I recall that I was thrilled with the interactive nature of S — one could really explore data quickly using a variety of descriptive and graphical methods.
  • I also was excited about the ability to write S “macros” — this was the predecessor of R scripts — to do specific analyses.

In my 1985 talk, I said that S was modern in that it contained many new procedures such as robust regression, hat matrix regression diagnostics, two-way fits of medians (EDA stuff), star plots, and Chernoff faces. To learn more, I mentioned the famous S book “S — An Interactive Environment for Data Analysis and Graphics” by Becker and Chambers, and the online system by typing commands like help(“plot”).

A Sample 1985 S Session

To give my audience a taste of S, I passed out a copy of a sample S session. Let’s look at the commands I used — this will show that the R is not that different from the S language 30+ years ago.

Here’s a snapshot of the first page of my handout (I recall we printed on wide paper with white and green bands.)

stalk2

  • I had some data on 7 measurements (horsepower, miles per gallon, engine size, curb weight, car length, rear legroom, retail price) on 19 subcompact cars from the 1985 model year. I had a file “car85” containing a matrix of these numerical measurements. I imported this data into S (using the S read function) and used the matrix function to organize the data into 19 rows and 7 columns. I added names to the rows and columns of this matrix. (NOTE: It appears that data frames had not been introduced yet.)
  • I illustrated the use of the [] notation to extract rows and columns from this matrix (NOTE: This has not changed.)
  • I illustrated some summary functions on vectors ( mean, sd), and illustrated the use of stem (stem-and-leaf) and hist (histogram) functions to graph a single variable. (NOTE: These functions are all available in R.)
  • For relating two numerical variables, I constructed a scatterplot using plot, used the identify function to pick out an outlying point, and added a regression line by the function abline(reg(esize, mpg)) . (NOTE: I still like the identify function in R, and the abline function works in the same way, although we now use the function lm to fit a line.)
  • I used the reg function again to illustrate multiple regression. When one implemented reg, one saw the components coef, resid, etc. I inputed the covariates by means of a matrix — I did not use the model notation response ~ covariate1 + covariate2, although it may have been available.
  • I illustrated all-possible-subsets regression using the leaps function.
  • To illustrate categorical data, I used the cut function to put the car prices in two groups, and the engine sizes in two graphs. The table function was used to produce a 2 x 2 contingency table, and used the tapply function to find the median price of the small engine and large engine cars.

Looking Back

It is clear that the R user and statistical computing communities need to give a lot of credit to the Bell Labs team that developed S many years ago. It provided a wonderful new interface for exploring data, gave us the ability to create scripts and document our work, and many of the R functions that we use in everyday work were developed in S.

To learn more about the beginnings of interactive computing with S, I’d recommend reading the ATT article “From S to R: 35 Years of AT&T Leadership in Statistical Computing“.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: