Category Archives: ggplot2

Ichiro’s Historic Path to 3000 Hits

(Evan Boyd is a senior studying statistics at the University of Wisconsin-Madison. As a writer and a data analyst, Evan is looking into sports analytics for a career. Evan also is the Station Manager at WSUM 91.7 FM Madison student radio. Feel free to reach out to him at enboyd@wisc.edu for any questions or for a resume, and follow him on Twitter @eboyd42)

 

Despite entering the big leagues when he was 27 years old, Ichiro Suzuki has made it to 3000 hits in his Major League career. It was no easy task – he had to have 10 straight 200+ hit seasons to do it, including an MLB record 262 hits in 2004. Ten straight seasons of 200+ hits are the most ever, and tied with Pete Rose for most in a career. Ichiro, who captured the 2001 AL MVP and Rookie of the Year (on one of the best teams ever mind you), will have an easy ride into the Hall of Fame.

Let’s analyze some of his numbers by comparing him to the other players who have 3000 hits. An easy way to monitor his (and everyone else’s) stats is through Baseball-Reference. Scraping a BR table can easily be done, especially since Jim has already shown in a recent article. I was able to take that code and apply it to many different players, using two loops like this:

library(XML)
names = 0
for (i in 1:30) {
names[i] = paste("http://www.baseball-reference.com/players/",
substr(hitname[i],1,1),"/",hitname[i],"01.shtml",sep="")
}

for (j in 1:30) {
assign(hitname[j],readHTMLTable(names[j]))
}

The variable names generates a list of BR website strings, calculated from hitname, a list I made that included each string of 3000-hit players. For example, Ichiro’s string is suzukic. See Jim’s code for more. From there, I assigned that string name to read the HTML Table of that batter. Now that there are 30 members of the 3000 hit club, there are 30 numbers to run through the loop.

 

Ichiro or Jeter?

Now that I am able to read each table from the 30 players, I am able to do some analysis. Born only a few months after Ichiro was another member of the 3000 hit club, Derek Jeter. Jeter’s career averages are relatively higher than Ichiro’s: .310/.377/.440 compared to .314/.357/.405. However, Jeter never had a season where he bat over .350, and Ichiro did that four times. Both had much higher Hit/Strikeout ratios than the league average, but Ichiro’s was much higher than Jeter’s.

The graph below was produced using plotly, a software similar to ggplot2 but goes a little farther with an interactive interface.

library(plotly)
p <- plot_ly(x = year, y = Hit.to.SO, name = "Ichiro Ratio") p %>% add_trace(x = 2000:2014, y = Jeter.Hit.to.SO, name = "Jeter Ratio",
line = list(shape = "spline")) %>%
add_trace(x = year.totals, y = Overall.hit.so,
name = "MLB Average", line = list(shape = "spline")) %>%
layout(title = "Hit-to-Strikeout Ratios",
xaxis = list(title="Year"),yaxis=list(title="Hit-to-Strikeout Ratio"))
p

Hit to Strikeout Ratios.png

In his rookie season, Ichiro hit safely 242 times, with only 53 strikeouts. If we take a look at their cumulative WAR from age 27 to 35, Ichiro has a 50.9 WAR, while Jeter has a 39.3 WAR. Ichiro was a better hitter for contact, and his speed helped a lot, but Jeter could hit for more power. If you asked me who to pick between the two during this time, I would say Ichiro on one day and Jeter on the other.

 

The Faster you Are, the Faster to 3K

With his 3,000th hit, Ichiro is not only the 30th member in the club, but he is one of six players in that group to also have 500 steals. I calculated Ichiro’s cumulative number of hits as he got older, as well as Lou Brock, Rickey Henderson, and Cobb’s.

Projections to 3000.png

Here, Ichiro stands in the middle of the pack. At 16 years, he is the youngest in this group to reach 3,000 hits. Henderson took the longest to reach, not reaching it until the end of his 23rd season. Clearly, Cobb was able to get a jump start and end up hitting 3000 hits at a younger age. However, if Ichiro started playing at age 18, would he have reached 3000 faster than Cobb? Actually, yes.

As you can see, each player’s run towards 3,000 appears to be linear. Calculating a simple linear regression between age and career hits, Ichiro has the highest linear coefficient at 194. That means that in his career, Ichiro gets 194 hits a year to be where he is at today. That is higher than every other player with 500 steals, with Cobb coming in second at 187. There is no doubt that Ichiro missed some good parts of his career while being in Japan. Maybe if his entire career was played in the United States, not only would he pass Rose’s record, but he would have reached 3000 hits faster than Cobb.

 

Batting Above the Mean

How “good” was Ichiro’s 2001 season, considering it was at the height of the steroid era where Barry Bonds was putting up cartoon numbers? Well, the MLB batting average in 2001 was .264, 10 points higher than last year. However, that still puts Ichiro at almost 90 points higher than the mean in 2001. I took a look at four of the best players of their time: Ichiro, Tony Gwynn, Pete Rose, and Ty Cobb, and observed the residual between their batting average versus the league average. Similar to the previous graph, I used plotly to generate the graph. View the interactive graph here.

newplot (2).png

One thing to note is that Cobb’s incredible numbers at an early age was during the dead ball era, but nevertheless his career was amazing. At some points, Ichiro was the worst at his age compared to the others. Even in his MVP year in 2001, his average compared to the mean was lower than Gwynn, Cobb, and Rose. On the other hand, he was much higher in other years – more than .05 higher than Gwynn and Rose at their age 30 seasons. Ichiro’s seasons are starting to dwindle, and his path towards batting under the mean looks similar to Rose’s.

 

Ichiro’s career may soon come to an end, despite the fact that he stated that he wanted to play until he was 50. His numbers are dwindling, and he is already the 4th outfielder on the Marlins. Sad to say, I do not think he will be the next Julio Franco. Nevertheless, Ichiro’s short time in the US will still be remembered as the greatest season ever by a Japanese-born player, and will continue to be an influence to players and fans in the future. Now, let’s see if Adrian Beltre, Albert Pujols, and Miguel Cabrera can make it to 3,000.