Analyzing Baseball Data with R

Some information about the book Analyzing Baseball Data With R, 2nd edition by Max Marchi, Jim Albert, and Ben Baumer:

Some useful links for the book.


128 responses

  1. Jim,

    I hope all is well. I have continued working through the text and just have a question regarding Chapter 5, Exercise 3.

    Is there anyway you can confirm the Runs Value for Rickie Weeks and Michael Bourn. I have 9.029 for Weeks and 7.360 for Bourn. I just want to be sure I am following along with the code properly. I used your code from the in Chapter Exercises with Pujols.

    Thanks so much,

    1. Lloyd:

      Here’s what I have for Exercise 3 of Chapter 5:

      d2016 %>% filter(BAT_ID %in% c(“eatoa002”, “marts002”),
      BAT_EVENT_FL == TRUE) %>%
      group_by(BAT_ID) %>%
      summarize(N = n(),
      M = mean(run_value),
      S = sum(run_value))
      ## # A tibble: 2 x 4
      ## BAT_ID N M S
      ## 1 eatoa002 706 0.0188 13.3
      ## 2 marts002 529 0.0179 9.48

      Hope this helps.


  2. Hi Jim,

    Hope all is well. I am working through Chapter 4 Exercise 3 (Manager Effect in Baseball) and ran into an issue running the solution. I get the following error running the code:
    Error: Problem with `summarise()` input `Mean_Residual`.
    x object ‘.resid’ not found
    i Input `Mean_Residual` is `mean(.resid)`.
    i The error occurred in group 1: playerID = “actama99”.

    It seems that for whatever reason the Augment function is not adding the .resid column. Instead I only get the following:
    > out out
    # A tibble: 345 x 10
    yearID teamID R RA .fitted .hat .sigma .cooksd .std.resid playerID

    I am using the solutions dated 1/10/2019.

    Any help or guidance on what is going wrong would be greatly appreciated.


    1. Hi Brandon:

      It appears that the broom package has changed what happens with the augment() function. I haven’t looked it carefully, but I see that the data frame out has the variable .std.resid instead of .resid. So I think if you replace mean(.resid) with mean(.std.resid) it should work fine.

      I’ll make a correction on those solutions.



      1. Thanks Jim! Greatly appreciate the quick response. Loving the book!

  3. In section 6.2.3 and getting below error. Package missing driving this?

    count_plot %+% run_value_by_count +
    + scale_fill_gradient2(“xRV”, low = grey10, high = crcblue,
    + mid = white)
    Error in count_plot %+% run_value_by_count :
    is.character(lhs) is not TRUE

    1. Robert, I just tried running that Chapter 6 from the script posted on our Github site and I couldn’t reproduce your error. We are using the tidyverse suite of packages, but nothing else. Unfortunately, without having your computer in front of me, I am not sure what is creating the issue. Sorry not to be of more help. Jim

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: