Monthly Archives: December, 2022

Review of Nine Years of Exploring Baseball Data with R

Introduction

Remarkably, it has been over nine years since Max Marchi and I started this blog to promote the Analyzing Baseball with R book. Unfortunately (from the viewpoint of the blog), Max eventually took a position with the Indians (now Guardians), but fortunately Ben Baumer was a very able pinch-hitter for Max’s work when we wrote the 2nd edition of the book in 2018. (Ben and I are thinking of updating the book for a 3rd edition in 2023.)

Looking back, I have written a total of 343 posts over these nine years and the blog itself has been visited by over 277,000 people and there have been over 577,000 page views. I have tried to stay true to the main goal of encouraging readers to explore the wealth of publicly available baseball data using R. It is easy to write posts since there are new sources of data and R is going through many changes.

In retirement, when I am not playing tennis, I continue to enjoy writing posts. But one challenge in this activity is that it is easy to forget what you wrote and I sometimes will write on specific questions that I had already addressed in earlier posts. So I thought it would be helpful, at least for my sanity, to collect blog posts on some of my more popular topics.

I just created a new “Current Baseball Research” page where the interested reader can see some of the topics that I have addressed over my many years of baseball research.

Research Topics

Currently, here are some of the topics that I have found interesting in my blog writing. For each topic, I have collected many of the relevant blog posts. Many of the posts include links to R code or descriptions of Shiny apps that illustrate the concepts.

  • Patterns of Hitting Home Runs in the Statcast Era. As the reader, there have been curious up-and-down patterns of hitting home runs in recent seasons. This paper provides a review of my research in home run hitting including work that I did with the MLB Commission.
  • Called Strikes. There are many curious biases in calling balls and strikes that one can explore using Statcast data.
  • Career Trajectories. One of my popular research topics is exploring the season-to-season patterns of hitters and pitchers over their career. Given all of the current long-term contracts given to players during this off-season, I wonder if the teams are that informed by the historical patterns of aging of players at different fielding positions.
  • Count Effects. One of the most important aspects of baseball is the count — both pitchers and batters want the count to change to favorable values. This collection of posts looks at the count from different perspectives.
  • Multilevel Modeling in Baseball. As a Bayesian, I have always been interested in the use of baseball data to illustrate the advantages of multilevel modeling where one is essentially fitting several regressions over different groups. One of the earliest illustrations of the benefits of multilevel models was by Brad Efron and Carl Morris using hitting data from a Sunday newspaper.
  • Streaky Patterns in Baseball. Another topic that I have enjoyed is the exploration of streaky patterns of individual players and teams. It is one thing to identify streaky patterns, but it is more challenging to search for players who have so-called streaky ability.
  • Shiny Apps in Baseball. Over the past few years, I have written quite a few Shiny apps which allow for interactive exploration of different types of baseball data. This paper introduces and provides snapshots of many of these apps. Code for these apps is available in several of my R packages.

I will continue to update the Current Baseball Research page to add more common topics that I’ve explored over the years.