Max and I started this blog back in the fall of 2013 to help promote our Analyzing Baseball Data with R book . Unfortunately (at least for the blog), Max took a baseball job with the Indians in 2014 and I’ve been the main contributor of the blog for over six years, although I have had some guest contributions from Ben Baumer, Carson Siefert, Brian Mills, and Aaron Baggett. (I was very fortunate to convince Ben Baumer to help me with a 2nd edition of ABDR that came out in the fall of 2018.) I thought now would be a good time to review the topics that we’ve discussed in the blog over the years, describe some of the popular topics and posts, and look at the future of using R to explore baseball (and other sports?).
Frequency of the Posts?
I’ve graphed the frequencies of posts that I’ve authored over this 6+ year period. As you see, I was writing about one post each week in 2016 and 2017, and I’ve slowed down to about one post every two weeks in the last two years. Actually, if I have a post idea, it typically doesn’t take too long to do the work and write a post. But currently, a pace of about one post every two weeks seems reasonable.
How Long is a Blog Post?
When one does this blog activity long enough, then you typically settle into a pattern of blog length. Interestingly, WordPress tells me that I wrote a total of 25,992 words for the 26 posts that I wrote in 2019. So I guess I tend to write a post of 1000 words.
Popularity of the Blog?
Obviously, you want people to read your blog posts and WordPress records the number of visitors and page views each day. Below is a table of the average number of page views for each month that the blog has been available. The blog has had a steady increase in page views from 2014 through 2017 and the page views has stabilized in the last two years. I currently get 100-200 page views a day, but I might get a high count for a particular post that is popular. My most popular day for visits was February 26, 2018 when I had almost 1000 page views.
What Topics Have I Covered?
I tend to write on baseball topics that I’m interested in and think will be of interest to readers who are learning R in the context of baseball. For each of my 200+ posts, I collected a few keywords, and I’ve graphed the number of posts of the most popular keywords in the history of the blog. You probably are not surprised that I have written a lot about home runs. I am also fascinated with subtle effects in hitting and pitching such as count-effects, streaky patterns, and career hitting trajectories. I have promoted Bayesian thinking in my posts, especially in multilevel modeling where, for example, observed batting averages are shrunk towards a common value. These improved estimates of batting ability are useful for predicting future performance.
Some of my posts have been popular and I have listed the ten posts with the most page visits. (These page views don’t adjust for the length of time that the post has been available.)
Here are links for these popular posts with a brief summary of each.
- Downloading Retrosheet Data. Since Retrosheet is a rich source of play-by-play data, a natural question is how to read this data into R. We provide several functions for downloading this data, collecting data for a single season in a data frame, and computing run expectancies.
- What Age Do Baseball Players Peak? Teams are keenly interested in aging patterns of players. Specifically, at what age do players achieve peak performance? This post describes a smoothing method for determining this peak age of a player.
- Comparing the plyr and dplyr packages Both the plyr and dplyr packages are helpful for data manipulation. This blog post demonstrates that dplyr is faster for performing the “splitting, applying and combining data” operation.
- Where’s the Data? When one gets started, one wants to know the main types of public baseball datasets and this gives a summary of what is available.
- Constructing Heat Maps for AVG A popular graph is to show how a player’s AVG varies for balls located in different locations of the zone. This post shows how to construct a heat map of AVG over the zone.
- Spray Charts for Statcast Data. Another popular baseball graph is a spray graph showing the locations of batted balls of a particular hitter. This post illustrates how to construct a spray graph from Statcast data.
- Chance of Hit as Function of Launch Conditions. The probability of a base hit depends on the launch angle, exit velocity and spray angle. This post illustrates using a generalized additive model to fit this model.
- Swing and Miss Rates. A batter has to be careful only to swing at “good” pitches in the zone. Also if he swings, he wants to make contact. This post explores swing and miss rates of many players.
- Graphing Pitch Count Effects. This post introduces a new graph that shows the advantage (from a runs perspective) of adding an strike or ball to the count.
- Graph of a Batting Average. This illustrates the use of a special graph to display the components of a batting average.
Honestly, I am not sure how this blog will continue, but here are some possibilities.
- One goal of this blog is to get more people exploring baseball data. So there probably is a need for some tutorial material, getting people started exploring baseball stats using R.
- Although my favorite spectator sport is baseball, other sports like basketball and football are becoming more analytical, and there are new R packages for scraping data from these other sports. I may give introductions to these non-baseball packages. Maybe this blog will morph into an “Exploring Sports Data Using R” blog?
- Of course, there will be new developments in baseball analytics such as new sources of data and new packages, and that will motivate new posts.
- Any comments or suggestions about future posts are always welcomed, either by commenting on this blog of sending me an email at email@example.com