Tribute to Joe Mauer: Exploring Miss Rates


There was a big moment for the Minnesota Twins this week:  Joe Mauer announced his retirement after playing 15 seasons for the Twins.  Obviously Mauer had a great career — he was the only catcher to win three American League batting titles and was the AL Most Valuable Player in 2009.  If one scans Mauer’s batting statistics, one obviously notices his high batting averages and low strikeout totals.  It would appear that Mauer had great batting control.  (There was a recent USA-Today article that talks about Mauer’s great plate discipline.) I thought I would look at this aspect of hitting more carefully.  That is, look at his Miss Rates defined by the number of swinging strikes divided by the number of swings.  Specifically …

  • How did Mauer’s miss rates compare to other full-time players during his career?
  • As we know, strikeout rates and miss rates have increased over recent seasons — did Mauer’s miss rates also increase during this period?
  • Over his career, how did Mauer compare to the top hitters with respect to miss rates?
  • What pitchers did Mauer struggle against from the perspective of miss rates?


For this study, I used Retrosheet play-by-play data since it was easy to access the data on my computer.  The key variable is PITCH_SEQ_TX that contains the results of all pitches during a plate appearance.  (Recall that each line in the play-by-play file is a plate appearance.)

A Career Look at Mauer’s Miss Rates

To start, I looked at seasons 2005 – 2017 where Mauer had at least 300 plate appearances.  For each season, I collected the miss rates for all hitters with at least 300 PAs.   Below I display parallel dotplots of the miss rates.  The red line corresponds to the median miss rate, the orange lines correspond to the quartiles, and Mauer’s values are shown as blue dots.  Generally, we see that Mauer’s miss rates fall in the lower quarter of the distribution.  In his best seasons, his miss rates fell under 10 percent.  His miss rates climbed over 15% during the 2013-2015 seasons, but it dropped to about 12 percent in 2017.  I guess it is not surprising that there was an increase in his miss rates since the median miss rate of all regulars increased from 0.17 to 0.21 during this period.  By the way, note that a miss rate of 30% or higher was rare in 2005, but relatively likely in 2017. This is one indication how the game of baseball has changed.


How Did Mauer Compare to the Best?

For the period 2005 through 2017, I collected all of the hitters who had at least 5000 swings (there were 294 players in this group).  Here are the top 10 hitters with respect to miss rates — Juan Pierre is at the top with a rate of 5.6 % and I don’t see Joe Mauer.  Looking further, I find that Mauer’s miss rate of 12.5 % is 37th on this list (out of 294).  So he compares very favorably with respect to miss rates among the hitters in this period.


Who Did Mauer Struggle Against?

I thought it would be interesting to explore how Mauer performed against different pitchers.  Specifically, for all pitchers that Mauer faced for at least 50 swings, who were the top 10 pitchers with respect to miss rates?  The most challenging pitcher was …. Jimmy Gobble?  (For those of you who are wondering, Jimmy Gobble was primarily a relief paper who pitched for the Royals and White Sox for seven seasons.)  Anyway, there are some notable pitchers on this list such as CC Sabathia, Carlos Carrasco, Corey Kluber and Chris Sale.


At the other extreme, here are the 10 pitchers (at least 50 swings) that Mauer had his lowest miss rates.  It is a bit surprising that Max Scherzer is at the top of the list — Mauer missed only 3 out of 81 swings against Scherzer.  (One has to be careful about drawing strong conclusions here due to the relatively small sample sizes.)



  • My motivation for this post was to give recognition to one of my favorite players Joe Mauer who will certainly be inducted in the Baseball Hall of Fame.  Due to the availability of data on each pitch, I think measures such as miss rates will get more attention as they are strongly associated with strikeout rates.
  • For those of you learning R, this type of exercise is good practice.  One needs to use string functions to extract the number of swings and swinging strikes from the PITCH_SEQ_TX variable.  The dplyr package is convenient for grouping the Retrosheet data using player id as a grouping variable and summarizing each group by the number of swings and misses.
  • If you are interested in learning more about miss rates, look at my recent post which explores swing rates and miss rates for 2018 regular hitters using Statcast data.

Revision Coming Out Soon

A reader asks:  “I’m wondering if you all have any plans to update this book, especially since StatCast data is a significant source of data that came out since the book has been published, and you have written on using the data to analyze players. Thanks!”

The 2nd edition of Analyzing Baseball with R is coming out in a few weeks.  I was fortunate to enlist Ben Baumer as a new coauthor.  The R code has been updated in all chapters to reflect the tidyverse and we have two new chapters that use Statcast data.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: