Edge Strike Accuracy Variability

Long time no post. I’ve been traveling a good bit over the last month, which has made it very hard to sit down and do something worthwhile here. I want to thank Jim for covering for me while I was away and slacking so much.

Today I’m going to have a short post, just doing some light exploratory work using Pitch f/x data. I often hear the idea that ifi umpires are consistent, this is more important than accuracy for the players to adjust accordingly. If they’re consistently incorrect with their calls, then at least the players know where the strike zone will be. So today I decided to take a look at variability in incorrect call rate among individual umpires from July 2014 through the end of the 2014 season (limited sample due to data size concerns).

First, I grabbed all pitches on the “edge” of the strike zone from my database, and posted them here. Note that I define the edge of the strike zone as a pitch that is within 3 inches of the strike zone boundary, which I fix for all players, rather than adjusting by height (either inside or outside the zone). I’ve already done that in this data set, so you won’t have to worry about that. However, if you have your own data, I used the following code and strike zone parameters to subset the data:

###define strike zone parameters
setwd("c:/...")
pitch <- read.csv(file="UmpireFX2014.csv", h=T)
head(pitch)

###define strike zone parameters
zBot <- 1.52
zTop <- 3.42
zW <- 0.83

###identify pitches that are within the defined zone
pitch$within_zone <- ifelse(pitch$px >= -(zW) & pitch$px <= zW & pitch$pz >= zBot & pitch$pz <= zTop, 1, 0)

###remove pitches with missing locational data
pitch <- subset(pitch, is.na(pitch$within_zone)==FALSE)

###identify pitches that were called by the umpire
pitch$called_by_ump <- ifelse(pitch$pitch_result=="Ball" |
    pitch$pitch_result=="Ball In Dirt" |
    pitch$pitch_result=="Pitchout" |
    pitch$pitch_result=="Intent Ball" |
    pitch$pitch_result=="Called Strike", 1, 0)

###subset to only looking at pitches called by the umpire
called <- subset(pitch, pitch$called_by_ump==1)

###remove pitches that don't require real judgment
called <- subset(called, called$pitch_result=="Ball" | called$pitch_result=="Called Strike")

###reduce to "edge" pitches
called$edgeBot <- ifelse(abs(called$pz - zBot) < 0.25, 1, 0)
called$edgeTop <- ifelse(abs(called$pz - zTop) < 0.25, 1, 0)
called$edgeLeftRight <- ifelse(called$edgeBot!=1 & called$edgeTop!=1 & abs(abs(called$px) - 0.83) < 0.25, 1, 0)

called$edge <- called$edgeLeftRight + called$edgeTop + called$edgeBot
called <- subset(called, called$edge==1)

Note, again, that you won’t need the code above if you just downloaded my data. Ok. Next, let’s identify the number of pitches on the edge of the strike zone that each umpire called incorrectly for each game that they worked behind the plate using the file from the link above (note that there is some redundant code here):

###load data
setwd("c:/...")
called <- read.csv(file="Edge2014Sub.csv", h=T)

###define strike zone parameters
zBot <- 1.52
zTop <- 3.42
zW <- 0.83

###make dummy variable for called strikes
called$strike_call <- ifelse(called$pitch_result=="Called Strike", 1, 0)

###create dummy variable telling whether STRIKE call was correct or not
called$correct_strike <- ifelse(called$within_zone==1 & called$strike_call==1, 1, 0)

###identify pitches outside of the zone
called$outside_zone <- ifelse(called$px < -(zW) | called$px > zW | 
    called$pz < zBot | called$pz > zTop, 1, 0)

###make dummy varaible telling whether BALL call was correct or not
called$correct_ball <- ifelse(called$outside_zone==1 & called$strike_call==0, 1, 0)

###make dummy indicating correct/incorrect call
called$correct_call <- called$correct_strike + called$correct_ball
called$incorrect_call <- ifelse(called$correct_call==0, 1, 0)
sum(called$incorrect_call)

###create unique umpire-game identiffier
called$umpDate <- paste(called$umpire,called$month,called$day,called$year,sep="_")
head(called)

###look at incorrect calls by umpire-game
wrongCalls <- data.frame(tapply(called$incorrect_call, called$umpDate, sum))
head(wrongCalls)

wrongCalls$umpDate <- row.names(wrongCalls)
colnames(wrongCalls)[1] <- "incorrectInGame"
head(wrongCalls)

wrongCalls$umpDate &lt;- row.names(wrongCalls)
colnames(wrongCalls)[1] &lt;- "incorrectInGame"
head(wrongCalls)

Notice above that we now have a new data set tallying the number of wrong calls by umpire, by game. We’ll need to sort this by umpire and date to ensure we can plot things correctly on our x-axis using the base graphics  in R. Strangely enough, while R knows how to order points on the x-axis according to the date–which we can identify by using the   function–it does not order the line segments correctly unless it is sorted within the data set by the date (thanks to Jim, for reminding me of this, as it was frustrating me earlier this month).

###identify as dates and order by umpire, then by date within umpire
wrongCalls$game_date <- as.Date(wrongCalls$game_date, "%m/%d/%Y")
wrongCalls <- wrongCalls[order(wrongCalls$umpire, wrongCalls$game_date), ]

Let’s begin by taking a look at two umpires that might be of interest, based on what we see in a recent 538 article by Noah Davis and Michael Lopez. I choose to draw the edge errors for each Lance Barksdale and Tim Welke, plotting them together by date. As you can see, there isn’t much to go on here in terms of differences in consistency across the season.

###compare two umpires
png(file="plotexample.png", height=500, width=600)
plot(wrongCalls$incorrectInGame[wrongCalls$umpire_name=="Lance Barksdale"] ~ wrongCalls$game_date[wrongCalls$umpire_name=="Lance Barksdale"], 
    ylim=c(0,25), xlim=c(min(wrongCalls$game_date), max(wrongCalls$game_date)),type="n", 
    main="", xlab="", ylab="")
grid()
points(wrongCalls$incorrectInGame[wrongCalls$umpire_name=="Lance Barksdale"] ~ wrongCalls$game_date[wrongCalls$umpire_name=="Lance Barksdale"], pch=16, col="darkred", cex=2)
lines(wrongCalls$incorrectInGame[wrongCalls$umpire_name=="Lance Barksdale"] ~ wrongCalls$game_date[wrongCalls$umpire_name=="Lance Barksdale"], col="darkred", lwd=2)
points(wrongCalls$incorrectInGame[wrongCalls$umpire_name=="Tim Welke"] ~ wrongCalls$game_date[wrongCalls$umpire_name=="Tim Welke"], pch=16, col="steelblue", cex=2)
lines(wrongCalls$incorrectInGame[wrongCalls$umpire_name=="Tim Welke"] ~ wrongCalls$game_date[wrongCalls$umpire_name=="Tim Welke"], col="steelblue", lwd=2)
legend(min(wrongCalls$game_date), 25, c("Lance Barksdale", "Tim Welke"), col=c("darkred", "steelblue"), lty=c("solid", "solid"), lwd=c(2, 2), bty="n", cex=1)
dev.off()

Given the lack of usefulness of this plot, let’s take a look at the average and standard deviation of incorrect calls for umpires across the league during our time frame. The code below will do this and put everything in a nice little data frame.

###summarize by umpire
###look at errors
IncDev <- data.frame(tapply(wrongCalls$incorrectInGame, wrongCalls$umpire_name, sd))
colnames(IncDev) <- "IncorrectSD"
IncDev$Umpire <- row.names(IncDev)
row.names(IncDev) <- 1:nrow(IncDev)

IncMean <- data.frame(tapply(wrongCalls$incorrectInGame, wrongCalls$umpire_name, mean))
colnames(IncMean) <- "IncorrectAvg"
IncMean$Umpire <- row.names(IncMean)
row.names(IncMean) <- 1:nrow(IncMean)

IncMin <- data.frame(tapply(wrongCalls$incorrectInGame, wrongCalls$umpire_name, min))
colnames(IncMin) <- "IncorrectMin"
IncMin$Umpire <- row.names(IncMin)
row.names(IncMin) <- 1:nrow(IncMin)

IncMax <- data.frame(tapply(wrongCalls$incorrectInGame, wrongCalls$umpire_name, max))
colnames(IncMax) <- "IncorrectMax"
IncMax$Umpire <- row.names(IncMax)
row.names(IncMax) <- 1:nrow(IncMax)

IncSummary <- merge(IncDev, IncMean, by="Umpire", all=T)
IncSummary <- merge(IncSummary, IncMin, by="Umpire", all=T)
IncSummary <- merge(IncSummary, IncMax, by="Umpire", all=T)
IncSummary

IncSummary <- IncSummary[order(IncSummary$IncorrectSD),]
row.names(IncSummary) <- 1:nrow(IncSummary)
IncSummary

               Umpire IncorrectSD IncorrectAvg IncorrectMin IncorrectMax
1           Jon Byrne    2.121320      9.50000            8           11
2      Manny Gonzalez    2.424621     10.66667            7           15
3        Jordan Baker    2.502499     14.56250           11           22
4     Stu Scheurwater    2.516611     11.33333            9           14
5       Marvin Hudson    2.663755     12.70588            7           17
6          James Hoye    2.743933     11.93750            8           18
7  Hunter Wendelstedt    3.020564     12.86667            8           17
8            Jim Wolf    3.117656     13.21429            8           18
9     Mike Muchlinski    3.183060     15.14286            9           19
10     Chad Fairchild    3.204164     14.86667            7           18
11       Gabe Morales    3.256158     11.46154            6           16
12        Scott Barry    3.291681     13.28571            8           19
13          Ed Hickox    3.335238     12.46667            7           17
14       Chris Conroy    3.338915     13.92857            7           19
15        Chris Segal    3.341656     12.50000            8           18
16    Lance Barksdale    3.347953     12.85714            9           21
17        Toby Basner    3.384840     11.80000            7           19
18        Eric Cooper    3.385631     11.56250            6           19
19         Dale Scott    3.387923     14.35714            9           19
20         CB Bucknor    3.455367     11.64286            5           18
21     Paul Schrieber    3.480558     13.40000            5           20
22      Larry Vanover    3.521990     14.17647            9           21
23       Bob Davidson    3.534860     15.26667            8           21
24   Fieldin Culbreth    3.582620     11.71429            6           21
25       Jim Reynolds    3.631365     12.57143            8           22
26        Sean Barber    3.681518     15.12500            9           19
27       Tom Woodring    3.688414     13.28571            9           22
28        Jerry Layne    3.741657     14.00000            8           19
29        Will Little    3.768289     11.25000            6           18
30      Kerwin Danley    3.797926     11.66667            7           20
31     Chris Guccione    3.812261     14.00000            8           22
32      Lance Barrett    3.820449     14.06250            8           22
33    Alfonso Marquez    3.844910     13.37500            3           20
34       Brian Knight    3.854734     13.23077            8           22
35        Dan Bellino    3.940259     14.23077            8           19
36          Jim Joyce    3.947573     14.12500            8           26
37       Mike Winters    3.950240     13.28571            6           20
38        Dana DeMuth    3.968627     11.66667            5           17
39       D.J. Reyburn    3.975198     12.57143            8           21
40          Rob Drake    4.154172     13.60000            7           22
41         Pat Hoberg    4.160387     13.94118            6           22
42       Jeff Kellogg    4.214705     12.07143            4           19
43           Joe West    4.273465     13.56250            7           21
44      David Rackley    4.300609     13.06667            8           22
45        Clint Fagan    4.327271     14.42857            9           23
46        Brian ONora    4.375883     13.92857            7           21
47        Cory Blaser    4.380774     12.76471            7           21
48      Tony Randazzo    4.400300     12.85714            6           21
49      Quinn Wolcott    4.404252     12.30769            6           19
50        Gerry Davis    4.468747     13.16667            5           19
51         Phil Cuzzi    4.517771     14.07692            6           24
52         Bill Welke    4.522957     15.80000            9           26
53      Andy Fletcher    4.540417     13.00000            4           22
54      Todd Tichenor    4.603510     13.50000            5           20
55     Mike Estabrook    4.632067     14.07143            8           24
56       Brian Gorman    4.678772     16.09091            8           23
57       Mark Carlson    4.685337     13.33333            6           24
58        Ted Barrett    4.718757     14.86667            6           23
59       Dan Iassogna    4.735301     15.50000            6           23
60        Tom Hallion    4.777988     14.81250            6           22
61       Mike Everitt    4.793845     14.86667            7           22
62        Mark Wegner    4.822490     12.38462            8           24
63        Jeff Nelson    4.827235     14.07143            9           26
64          Al Porter    4.850135     13.33333            6           21
65       John Tumpane    4.879500     15.33333            7           26
66    Angel Hernandez    4.901955     15.81250            8           23
67         Paul Emmel    4.910223     14.11765            7           26
68       Marty Foster    4.963678     14.93333            7           26
69        Paul Nauert    4.968472     13.60000            5           25
70          Ron Kulpa    4.980846     16.05882            6           23
71   Tripp Gibson III    5.013218     11.58824            5           22
72     Mark Ripperger    5.039712     15.11111            4           24
73    Gary Cederstrom    5.060622     16.07143            9           26
74        Adam Hamari    5.077964     14.64286            8           30
75   Victor Carapazza    5.085302     13.88235            7           24
76        Jerry Meals    5.216275     15.06667            8           25
77        Tim Timmons    5.218311     15.00000            8           24
78        Greg Gibson    5.230931     14.85714            9           28
79          Tim Welke    5.231026     13.50000            3           22
80            Ben May    5.349677     15.57143            9           24
81           Laz Diaz    5.433582     16.33333            7           26
82       Doug Eddings    5.476845     14.56250            4           24
83     Adrian Johnson    5.599908     13.76923            3           26
84        Bill Miller    5.606172     15.68750            7           33
85       Angel Campos    5.656854     13.00000            9           17
86        Mike DiMuro    5.974649     14.37500            8           24
87    Marcus Pattillo    7.211103     13.00000            7           21
88        Jeff Gosney          NA     10.00000           10           10
89   Seth Buckminster          NA     14.00000           14           14


Here, we’ve ordered things so that the least variable umpires come out on top, while the most variable are on the bottom. It could be worth also doing a coefficient of variation, to get the variation in incorrect calls relative to the average number of incorrect calls. If we take a look at the first plot below, we see that the positive relationship between the variability and the mean is related; however, this changes after we adjust for each average for the umpires. This way, our measure is consistency, relative to accuracy, rather than a combination of these two measures.

###make coefficient of variation and plot
png(file="IncSDPlot.png", height=500, width=600)
plot(IncorrectSD ~ IncorrectAvg, data=IncSummary, pch=16)
dev.off()

IncSummary$IncorrectCV <- IncSummary$IncorrectSD/IncSummary$IncorrectAvg

png(file="IncCVPlot.png", height=500, width=600)
plot(IncorrectCV ~ IncorrectAvg, data=IncSummary, pch=16)
dev.off()

IncSDPlot IncCVPlot

Once we re-order our umpires using the CV measure, we can see who was the most consistent in their incorrect calls during our time period. And from there, I’ll let the reader be the judge. Note that “incorrect” isn’t particularly accurate as a classification here, given that we have a fixed zone. But for our purposes of exploring and using R, I’m going to be hesitant at making any observations on umpire quality (and especially so, given our small sample of games here).

Also note that the idea of consistency across games, rather than within games, is not likely to be what players are talking about when they are annoyed by “inconsistent” calls. However, finding a good measure of consistency within game is a bit more difficult, particularly with smaller samples (only a few pitches that qualify to identify consistency in each game) and strike zone modeling usually requiring a relatively large sample size. This is a problem I’m currently working on. In any case, have fun with the data.

###order by CV
               Umpire IncorrectSD IncorrectAvg IncorrectMin IncorrectMax IncorrectCV
1        Jordan Baker    2.502499     14.56250           11           22   0.1718454
2       Marvin Hudson    2.663755     12.70588            7           17   0.2096473
3     Mike Muchlinski    3.183060     15.14286            9           19   0.2102020
4      Chad Fairchild    3.204164     14.86667            7           18   0.2155267
5     Stu Scheurwater    2.516611     11.33333            9           14   0.2220540
6           Jon Byrne    2.121320      9.50000            8           11   0.2232969
7      Manny Gonzalez    2.424621     10.66667            7           15   0.2273082
8          James Hoye    2.743933     11.93750            8           18   0.2298582
9        Bob Davidson    3.534860     15.26667            8           21   0.2315411
10 Hunter Wendelstedt    3.020564     12.86667            8           17   0.2347589
11           Jim Wolf    3.117656     13.21429            8           18   0.2359307
12         Dale Scott    3.387923     14.35714            9           19   0.2359747
13       Chris Conroy    3.338915     13.92857            7           19   0.2397170
14        Sean Barber    3.681518     15.12500            9           19   0.2434061
15        Scott Barry    3.291681     13.28571            8           19   0.2477609
16      Larry Vanover    3.521990     14.17647            9           21   0.2484391
17     Paul Schrieber    3.480558     13.40000            5           20   0.2597432
18    Lance Barksdale    3.347953     12.85714            9           21   0.2603964
19        Jerry Layne    3.741657     14.00000            8           19   0.2672612
20        Chris Segal    3.341656     12.50000            8           18   0.2673325
21          Ed Hickox    3.335238     12.46667            7           17   0.2675324
22      Lance Barrett    3.820449     14.06250            8           22   0.2716764
23     Chris Guccione    3.812261     14.00000            8           22   0.2723044
24        Dan Bellino    3.940259     14.23077            8           19   0.2768831
25       Tom Woodring    3.688414     13.28571            9           22   0.2776225
26          Jim Joyce    3.947573     14.12500            8           26   0.2794742
27       Gabe Morales    3.256158     11.46154            6           16   0.2840943
28         Bill Welke    4.522957     15.80000            9           26   0.2862631
29        Toby Basner    3.384840     11.80000            7           19   0.2868509
30    Alfonso Marquez    3.844910     13.37500            3           20   0.2874699
31       Jim Reynolds    3.631365     12.57143            8           22   0.2888586
32       Brian Gorman    4.678772     16.09091            8           23   0.2907712
33       Brian Knight    3.854734     13.23077            8           22   0.2913462
34        Eric Cooper    3.385631     11.56250            6           19   0.2928114
35         CB Bucknor    3.455367     11.64286            5           18   0.2967800
36       Mike Winters    3.950240     13.28571            6           20   0.2973299
37         Pat Hoberg    4.160387     13.94118            6           22   0.2984244
38        Clint Fagan    4.327271     14.42857            9           23   0.2999099
39          Rob Drake    4.154172     13.60000            7           22   0.3054538
40       Dan Iassogna    4.735301     15.50000            6           23   0.3055033
41   Fieldin Culbreth    3.582620     11.71429            6           21   0.3058334
42    Angel Hernandez    4.901955     15.81250            8           23   0.3100051
43          Ron Kulpa    4.980846     16.05882            6           23   0.3101626
44        Brian ONora    4.375883     13.92857            7           21   0.3141660
45    Gary Cederstrom    5.060622     16.07143            9           26   0.3148831
46           Joe West    4.273465     13.56250            7           21   0.3150942
47       D.J. Reyburn    3.975198     12.57143            8           21   0.3162089
48        Ted Barrett    4.718757     14.86667            6           23   0.3174052
49       John Tumpane    4.879500     15.33333            7           26   0.3182283
50         Phil Cuzzi    4.517771     14.07692            6           24   0.3209346
51       Mike Everitt    4.793845     14.86667            7           22   0.3224560
52        Tom Hallion    4.777988     14.81250            6           22   0.3225646
53      Kerwin Danley    3.797926     11.66667            7           20   0.3255365
54      David Rackley    4.300609     13.06667            8           22   0.3291282
55     Mike Estabrook    4.632067     14.07143            8           24   0.3291824
56       Marty Foster    4.963678     14.93333            7           26   0.3323891
57           Laz Diaz    5.433582     16.33333            7           26   0.3326683
58     Mark Ripperger    5.039712     15.11111            4           24   0.3335103
59        Will Little    3.768289     11.25000            6           18   0.3349590
60        Gerry Davis    4.468747     13.16667            5           19   0.3393985
61        Dana DeMuth    3.968627     11.66667            5           17   0.3401680
62      Todd Tichenor    4.603510     13.50000            5           20   0.3410008
63      Tony Randazzo    4.400300     12.85714            6           21   0.3422455
64        Jeff Nelson    4.827235     14.07143            9           26   0.3430522
65        Cory Blaser    4.380774     12.76471            7           21   0.3431942
66            Ben May    5.349677     15.57143            9           24   0.3435573
67        Jerry Meals    5.216275     15.06667            8           25   0.3462129
68        Adam Hamari    5.077964     14.64286            8           30   0.3467878
69         Paul Emmel    4.910223     14.11765            7           26   0.3478075
70        Tim Timmons    5.218311     15.00000            8           24   0.3478874
71       Jeff Kellogg    4.214705     12.07143            4           19   0.3491471
72      Andy Fletcher    4.540417     13.00000            4           22   0.3492628
73       Mark Carlson    4.685337     13.33333            6           24   0.3514003
74        Greg Gibson    5.230931     14.85714            9           28   0.3520819
75        Bill Miller    5.606172     15.68750            7           33   0.3573656
76      Quinn Wolcott    4.404252     12.30769            6           19   0.3578455
77          Al Porter    4.850135     13.33333            6           21   0.3637601
78        Paul Nauert    4.968472     13.60000            5           25   0.3653288
79   Victor Carapazza    5.085302     13.88235            7           24   0.3663141
80       Doug Eddings    5.476845     14.56250            4           24   0.3760924
81          Tim Welke    5.231026     13.50000            3           22   0.3874834
82        Mark Wegner    4.822490     12.38462            8           24   0.3893936
83     Adrian Johnson    5.599908     13.76923            3           26   0.4066973
84        Mike DiMuro    5.974649     14.37500            8           24   0.4156277
85   Tripp Gibson III    5.013218     11.58824            5           22   0.4326127
86       Angel Campos    5.656854     13.00000            9           17   0.4351426
87    Marcus Pattillo    7.211103     13.00000            7           21   0.5547002
88        Jeff Gosney          NA     10.00000           10           10          NA
89   Seth Buckminster          NA     14.00000           14           14          NA
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: