Monthly Archives: May, 2014

Tribute to Wrigley Field

Wrigley Field is celebrating its 100th birthday in 2014. I’m visiting Chicago this week and it seemed appropriate to post something honoring the Cubs and their famous ballpark.

How has the Cubs home attendance changed over the last 50 years? The Retrosheet gamelog data contains the official attendance for each game. I’ll use this data to graph the median attendance of Cub home games and compare the Cubs attendance with the attendance of their cross-town rivals, the White Sox.

The first step is to download the game log data from Retrosheet. The text file gl1901.txt contains data for all games played in 1901. For this exploration, I need the game logs for the seasons 1954 through 2013 and so I download the files gl1954.txt, …, gl2013.txt and put them in a folder “gamelogs”. (By the way, this downloading needs to be done only once — you might want these files for future work.)

All of the R code to find the appropriate data and construct the graphs can be found here.

I wrote a short function that will compute the median attendance for an arbitrary team during a sequence of seasons. This function reads the appropriate game log files contained in the folder “gamelogs”. The output of this function is a data frame with two variables Season and Median . Then I use the ggplot2 package to construct the graphs.

First I graph the median Cubs home attendance against season and overlay a smoothing function to help see the general pattern. We see a big initial increase in attendance until 1970 followed by a lull in the middle 70’/early 80’s. From 1984 through 1997 attendance was pretty good and in 1998 the attendance took a substantial jump and the Wrigley attendance has been averaging close to 40,000 in recent years. The yellow shading indicates that lights were installed in Wrigley during the 1988 season. This was a controversial decision, but it certainly boosted the attendance.


Next, we graph the median White Sox home attendance against season. The Sox attendance showed a steady increase from 1970 through 1985, and has displayed dramatic decreases and increases since 1985. I haven’t looked carefully at the White Sox win/loss records, but I suspect that these highs and lows correspond to seasons when the Sox were respectively successful and unsuccessful in winning games.


We can compare the popularity of the Cubs and White Sox by graphing the ratio of the Cubs average attendance to the White Sox average attendance across season. Since most of the points fall over the line y = 1, clearly the Cubs draw more fans on average than the White Sox. In fact, there were seasons (mid 80’s and late 90’s) where the average Cubs attendance was over double the average Sox attendance. Currently it looks like the Cubs average 40 percent higher attendance in recent seasons.


Certainly there is much one could do with attendance data. It might be interested in explore the “new ballpark” effect on attendance and get a better understanding of the day-of-the-week effects. Certainly MLB teams need a good understanding of attendance patterns — this understanding is a first step towards the use of different marketing methods to increase attendance.