Analyzing Baseball Data with R

Some information about the book Analyzing Baseball Data With R by Max Marchi and Jim Albert:

Some useful links for the book.

Book cover

Advertisements

14 responses

  1. […] my foray into R with baseball is a neat graphic based on a recent post from the authors of Analyzing Baseball With R.  They use the R statistical programming language to go through the copious amount of baseball […]

  2. Question 7 Chapter 3, p. 85 asks you to pull Pete Rose’s info, but from what I can tell, the function “getinfo” doesn’t work for two players with the exact same name (junior), or am I wrong?

  3. Yeah, I’ll try to fix this and then make the new function available — thanks.

  4. Errata link seems to be broken

    1. Aaron — thanks for noticing that — it should work now.

  5. Sergio Marrero Marrero | Reply

    Hello ! In regard to Chapter 9. I would like to have more information (further reading) about these topics:

    1) How to build a transition matrix to simulate a complete game between two teams, taking into account all offensive(batters/runners) and defensive (pitchers/fielding) strength.
    2) The method used in section 9.2.6 to estimate the transition matrix. In the book is written: “The description of this methodology is beyond the level of this book…” but no further reading or reference is given

    I feel lot of interest on these topics and I appreciate to have some references to continue my research.

    Thanks in advance. Sergio.

  6. Hello !! In regard to the Bradley-Terry model (chapter 9).

    1) Section 9.4: Further reading. The reference of “Chapter 9 of Albert and Bennet (2003)” seems to be wrong as in my book copy the Bradle Terry model is developed in the “Chapter 12, Did the best team win?”. Maybe I have a different edition.

    2) After examine the “Chapter 9 of Anaylzing Baseball Data With R” I jumped to the “Chapter 12, Did the best team win?. Curve Ball” with the hope of finding how to calculate the “Talent(t)” of teams. I did not find anyhing about it. The only way I know is “log5 model by Bill James”. I have thought on maximize the likelihood to find the “teams talent (t)”, but I would like to ask for some reading before jump on my own developing.

    So, is there any other approach to calculate the talents? Can anyone helpy me with further readings about it?

    Lot of thanks in advance !

    Sergio.

  7. Hello !!

    In regard to the Bradley-Terry model (chapter 9).

    1) Section 9.4: Further reading. The reference of “Chapter 9 of Albert and Bennet (2003)” seems to be wrong as in my book copy the Bradle Terry model is developed in the “Chapter 12, Did the best team win?”. Maybe I have a different edition.

    2) After examine the “Chapter 9 of Anaylzing Baseball Data With R” I jumped to the “Chapter 12, Did the best team win?. Curve Ball” with the hope of finding how to calculate the “Talent(t)” of teams. I did not find anyhing about it. The only way I know is “log5 model by Bill James”. I have thought on maximize the likelihood to find the “teams talent (t)”, but I would like to ask for some reading before jump on my own developing.

    So, is there any other approach to calculate the talents? Can anyone helpy me with further readings about it?

    Lot of thanks in advance !

    Sergio.

    1. Sergio: As I recall, I used a value of the standard deviation of the Bradley Terry talent distribution so that predicted w/l records of the simulated data resembled the observed w/l records. One could formally fit this B-T model and estimate the standard deviation, but I believe I used this empirical approach to estimate the standard deviation. This type of B-T model is typically used in paired comparison models and there are some Bayesian papers on this.

  8. Jim –

    In chapter 1, you guys state that “In 2011, hitters compiled a .253 batting average on plate appearances where they fell behind 0-2. Conversely they hit .479 after going ahead 2-0.” I’m trying to replicate those numbers and even using the pbp11rc.csv file, I can’t even come close. Instead of batter average, did you mean OBP?

  9. I’m trying to add an “Age” column in the Lahman batting.csv file. My idea is that I can use a combination of getinfo and the sapply function. I’m comfortable using the getinfo function for individual players. I’ve attempted to adapt the function to do this but I’m struggling. Any suggestions?

    Much appreciated. I’m really enjoying this book so far!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: