Max Marchi lives in Bologna, Italy.
He started playing baseball when he was 5 and has been performing statistical analysis on baseball data since 2002.
His works have been published at Baseball Prospectus and The Hardball Times.
Jim Albert lives in Findlay, Ohio.
He grew up in Philadelphia and remains a passionate Phillies fan. Although he grew up playing Little League baseball, tennis is his favorite sport. He has coauthored Curve Ball with Jay Bennett and Teaching Statistics Using Baseball
Ben Baumer lives in Florence, Massachusetts.
He played baseball through high school and worked for the New York Mets as their Statistical Analyst from 2004 until 2012. He has recently coauthored The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball, and the openWAR package for R.
Carson Sievert lives in Ames, Iowa.
Currently a graduate student at Iowa State, Carson’s roots are in Minnesota where the Twins are practically a religion. He grew up playing baseball and obsessing over the value of his precious baseball card collection. In recent years, he authored the R package pitchRx for collecting and visualizing PITCHf/x data.
Brian Mills lives in Gainesville, Florida.
Brian is currently an Assistant Professor at the University of Florida after a cold, five year stay in Ann Arbor, Michigan. He played baseball at the Division III level and researches managerial sports economics. Most recently, Brian has been working on measurement and visualization of MLB umpire performance using PITCHf/x data.
Aaron Baggett lives in Belton, Texas.
Aaron is currently an Assistant Professor of Psychology at the University of Mary Hardin-Baylor. He teaches courses in introductory statistics and experimental design for the social sciences. A former high school and college baseball umpire, Aaron is primarily interested in modeling MLB umpire ball—strike judgment and decision making patterns using PITCHf/x data with application to the sport and cognitive psychological sciences. His Ph.D. dissertation was entitled, Effects of Pitch Location and Count on Professional Baseball Umpires’ Ball–Strike Decisions.
Hey guys really enjoying your book and analyzing stats with R. I am trying to run simulations for a project where i use individual players transitional matrices. I am having a little bit of a challenge in computing these matrices. Can you please help????
also is there a way to compute a transitional matrix for all leadoff hitters, clean up hitters, etc… for a particular year?
Thanks,
Eugene
Eugene, on page 215 of ABDR, we use a table function to find the frequencies of transitions from the current state to the new state where a state is defined by number of outs and runners on base. If you want individual transition matrices for each player, you just write table(BAT_ID, STATE, NEW_STATE) and store in a multiway array T3. T3[1,] will give the transition matrix for the first player, etc.
Thanks Jim!!!
Hi baseball analysis experts,
I have been reading your articles on pitchf/x data analysis, and have been trying to tailor the information your provide to my goal of analyzing historical data on the season level, play-by-play level, as well as the pitchf/x pitch-by-pitch level. My goal is to create a database with this data and analyze it in SPSS and R. So far, I have been able to create .csv files for retrosheet games, events, and substitutions for data from 1952 to 2014 as well as a .csv file reflecting the season-level player data from Lahman’s database.
Three challenges arise that I wonder if someone can speak to:
First, how can I determine when it’s necessary to combine tables (e.g., “at bats” and “pitch”) and/or datasets (retrosheet, layman, pitch-by-pitch) (or not) with the goal of looking for interactions between these variables. I noticed that Carson recommends to split some data into separate files and combining other data into the same file in the R directory. Given my goal, which data should be kept separate and which combined, and what’s the rationale for doing one or the other other than having sufficient space on my hard drive and for making it take the least amount of time to run an analysis?
Next, what I’m missing is historical pitcher data that includes or can be manipulated to include indices like ERA, WHIP, FIP, SIERA, etc., not only on the season-level but also season-to-date (at any point during a given season). Do you know if this pitcher-specific data is derivable from the pitch fx or the retrosheet database and if not directly, then via manipulating the data to create new columns reflecting those indices?
Also, what is the best way to determine when tables and/or data subsets should be joined/merged and when they should be separate if my ultimate goal is to use all of this data to do both linear and non-linear modeling for prediction purposes.
Needless to say, your articles have been enlightening, and the information your provided has been a big help so far.
Any guidance on this would go a long way.
Thank you in advance.
Lee
Lee, regarding how to manage data tables, I just focus on the questions I’m interested in addressing, and then merge the relevant data frames so that all of the variables of interest are in a single data frame. For example, for the pitchRx tables, I combine variables from the atbat and pitch data frames to learn about swing and missing for different batters, or different pitchers. The point is that I would merge datasets in unique ways, depending on the goal of the study.
Just figured out the Carson is also one of the guys behind Plotly. I just thought you were the “pitchRx guy.” 😉 Good work Carson!
I’m friends with a pitching coach in a minor league organization and need some help with a baseball stat. Here is the question. If I had a way to take away 1, 2 or 3 hits per game how many games would a team win over a 162 game season? Is this something you could help with? I just need an estimation. Thanks. Scott
Scott, what do you mean “to take about 1, 2, or 3 hits per game”? Jim
just looking at the stats. 2 hits equals 1 run. 10 runs equals a 1 win. One hit in 162 games is equal to 81 runs or 8.1 wins per season. Is this correct? Scott
Jim,
“to take away” a single hit per game or two hits or three hits per game. How would that translate into wins over a 162 game season.