Thursday, November 20, 2008

"Baseball's Best Player", by Bill James

It's pretty obvious from the last few posts that I spend a lot of time on Bill James Online. It's a pay site, but it's ridiculously affordable (only $3/month) and gives you some interesting things. You get access to a unique stable of Bill James stats (many of which you can only see on his site or in his books like The Bill James Handbook), a steady-flow of new articles (beyond just baseball), and even access to Bill himself in the "Hey Bill" section (where he responds personally to reader's questions). Other perks of the site are access to the smart group of forum users and a pretty good list of older articles that Bill published elsewhere (or not at all). I highly recommend becoming a member.

Going through that back-catalog of articles a few weeks ago, I stumbled across a pair of articles Bill originally wrote after the 2003 season called "Baseball's Best Player" and "Baseball's Best Player (mirror)". Without going too deep into the method or the findings (that's what Bill's site is for, after all), I'll say that Bill used a player's four-year (weighted) average of Win Shares to see who was the best player over those past four years (in the "mirror" article, he looked ahead to the next four years, using real numbers of course, not predictions). The method seemed pretty sound to me, and gave some really fair and mostly expected results (Babe Ruth rules for a few years, then Gehrig, then Williams, etc, with Bonds ruling for much of the last 10-12 years, even before BALCO).

What makes the list most interesting, I think, are those years between the true greats, when the historically great players are either too old or too young to have dominated the league for three straight years. The players who emerge during these times as the "best player in baseball" make you pause for a minute but, upon further reflection, seem sensible. These players include: Ron Santo in 1967; Bobby Murcer in '72; Dave Parker in '78-'79; Tim Raines in '87; and Will Clark in '89-'91.

That's right, Will Clark was the best player in baseball for three straight years (meaning he dominated for 6 years overall). It's sometimes hard to remember how good some of these guys were compared to their contemporaries but, when you look at their numbers and their contemporaries' numbers in the same years, it's hard to argue.

I didn't have much more to say about this article besides how fascinating it was. I did run the numbers to get the leaderboards since 2003 (since that's when Bill last ran the numbers). Those numbers are below.

It's important to note, however, how A-Rod has never been able to bust through the reigns of Barry Bonds and Albert Pujols as the best player in baseball. He's always 2nd or 3rd, but Bonds was putting up insane numbers when A-Rod was in his 20s, and now Pujols is putting up insane numbers as A-Rod plays in his 30s. The comment I made on the article itself: "Using these numbers, it's clear that A-Rod had the great misfortune of playing his early years opposite Barry Bonds and his later years opposite Albert Pujols, two of the greatest talents since Mantle & Mays. Does the fact that Hank Aaron never controlled the crown before he was 35 keep him out of the conversation of greatest player of his generation/all-time? Of course not, and I don't think A-Rod constantly finishing just behind Bonds and Pujols should mean anything different."

The 2004-2007 leaderboards using Bill's method of looking at the past three years are as follows:
2004
Barry Bonds............48.1
Albert Pujols..........37.6
Alex Rodriguez.........32.2
Bobby Abreu............31.7
Scott Rolen............31.2

2005
Albert Pujols..........38.6
Alex Rodriguez.........33.7
Gary Sheffield.........32.1
Bobby Abreu............30.8
Manny Ramirez..........30.2

2006
Albert Pujols..........39.1
Carlos Beltran.........31.1
Alex Rodriguez.........30.3
Manny Ramirez..........30
Bobby Abreu............29.8

2007
Albert Pujols..........36.1
Alex Rodriguez.........33.5
Miguel Cabrera.........30.2
Carlos Beltran.........29.9
David Wright...........29.7

As you can see, Pujols does succeed Bonds as the reigning "Best Player" for at least three straight years. I find it interesting how quickly David Wright found himself on that list, especially considering that his 2004 year gave him only 9 WS in 69 games played.

And, using the next three years, the leaderboards are:
2001: Bonds, A-Rod, Pujols
2002: Bonds, Pujols, A-Rod
2003: Pujols, Bonds, A-Rod
2004: Pujols, A-Rod, Abreu

Outlier Seasons

(originally posted on the Bill James Online forums, in response to user evanecurb's call for player's who had "great years that came from nowhere and never happened again")

Well, it's an interesting question to try to quantify. I figured that looking at Win Share discrepancies would do the trick. However, that ends up being a little tricky.

First, I looked for any person (since 1950, to make it easier) who had only one season with more than 20 Win Shares and no other seasons with more than 12. This rules out someone like Matt Nokes, whose 1987 season was actually only 20 Win Shares, and who also had 17 Win Shares in 1988. Also, this method ends up ignoring a lot of the names that you mentioned.

So I tried it a different way, looking for people whose max Win Shares year was much higher than his next-highest Win Shares year. Doing it this way, I do end up seeing the names that you've mentioned, but much lower on the list than you might think (or not even on the list at all). For example, Brady Anderson's 1996 was insane, and I say that as an Orioles fan. however, in Win Shares, it's a different story. His Win Shares for 1996 was 28; however, in 1992, he actually had 29 Win Shares (and 26 in '97). all that to say, he was a highly productive player for much more than that one season. (Luis Gonzalez is in pretty much the same boat)

With all that said, the interesting names of note I found are (there are many more than this):

Billy Grabarkewitz, 3B, LAD 1970 - 29 WS, no more than 5 for the rest of his 7 year career
Mark Fidrych 1976 (not a big surprise) - 27 WS, no more than 7 the rest of his career
Dick Ellsworth, 1963 - 32 WS, with his next highest being 13
Kevin Mitchell, 1989 - 38 WS, with his next highest being 20
Willie McGee, 1985 - 35 WS, with his next highest being 21
Norm Cash fits in well - 1961, 42 WS; next highest was 27
Cito Gaston, 1970 - 24 WS; next highest is 10
Rick Wilkins, 1993 - 28 & 14
Howard Johnson, 1989 - 38 & 25
Rick Cerone, 1980 - 21 & 9
Joe Torre, 1971 - 41 & 29
Chris Hoiles, 1993 - 26 & 14
Ken Caminiti, 1996 - 38 & 26
Craig Worthington, 1989 - 20 & 8
Dwight Gooden, 1985 - 33 & 18

With Hoiles, Worthington, and Brady, there are a lot of early-to-mid 90s Orioles on the list.

I'm not sure how convinced I am in saying that people like Joe Torre or Howard Johnson or Ken Caminiti who had multiple good years can be considered "outliers". Again, it depends on how you want to classify an "outlier". is it someone who had one outrageously good year despite multiple other solid-to-very good years? or is it someone who had only one very good year and only mediocre years otherwise? it is fun to consider some of those crazy seasons, though.

oh, and one more, since I had originally ignored active players (assuming that they still have a chance to have another great season)... Adrian Beltre's contract year of 2004 gave him 37 WS... his next highest year was 2000, when he had 22 WS (was he playing for arbitration money or something then)... maybe this coming year will be another great year, though, since it's another contract year...

Looking at old preview guides

(originally posted on Bill James Online forums)

I don't know about other people here, but something I've always found fascinating is looking through old copies of those "Baseball Preview Guides" that come out at the start of every year. Whether it's the old Sporting News Baseball Yearbook, or Street & Smith's, or Athlon, etc, I find it very interesting to see what writer's thought of players while they were still playing, year-to-year, and also to see whether anyone could ever imagine some of the feats that might happen that year (eg, did TSN have any inkling of what would happen in 1998?)

I have my own little collection of these (my brothers and I have buying every Sporting News yearbook since 1995, and every Athlon since 1999, and I've also purchased some other magazines from the 80s off of ebay), and I thought others might be as interested in me in how some of these writers saw things back before anyone knew what would actually happen. The most interesting section of some of these magazines is the minor leagues/college baseball sections, where they talk about all the different possible prospects of the years to come. Did they see the HOF capability of Wade Boggs or Tom Glavin? Were they ready to give a couple MVP trophies to Billy Ripken before he played a game in the bigs? It's fun to read...

Here are a couple of snippets from the 1988 Street & Smith's:

Atlanta: "This farm system is hurting almost as much as the parent club. At Richmond, southpaw Tom Glavine was 6-12, but his ERA was 3.35, the fourth-best in the league... "

Cubs: "Outfielder Rafael Palmiero figured to be the Cubs smash hit in '97, but he spent half the year at Iowa (.299, 11 HR in 214 at-bats). Called up, his bat didn't quiet down (.276, 14 HR in 221 ABs). His .543 slugging percentage showed he's a legitimate big league hitter... Mark Grace (.333, 17, 101) led Pittsfield to the Eastern League pennant and won Double-A all star honors. The EL's MVP and top-rated prospect - not a bad status for a 25th-round pick in the '85 draft - struck out just 24 times in 453 at-bats. Comparisons with Wally Joyner have already been uttered."

and last, for now
Houston: "Like [Gerald] Young, third basemen Ken Caminiti lost his rookie status. But hte switch hitter with the sterling glove was leading the SL in batting (.325, 15, 69) when he was called up, and batted .246 in 63 games for the Astros. He appears to have no weaknesses and must play every day in the bigs... Catcher Graig Biggio [sic], the Astros No. 1 draft pick in '87, hit .375 with 49 RBIs and 31 SBs in 64 games at Asheville."

I also liked this little bit in the article introducing the minor league section: "Which draft will be remember as the best draft of all? It's easy to suggest the 1985 draft, which saw so many first-riounders make it to the major leagues so quickly. BJ Surhoff, Will Calrk, Bobby Witt, and Barry Larkin - 1,2,3,4 in that draft - were all in the majors by 1986 or early '87. That was the draft, by the way, in which Gregg Jeffries was the No. 20 selection... The 1984 draft was terrific, too. Cory Snyder, Oddibe McDowell, Billy Swift, Scott Bankhead - US Olympians all... What about the '81 draft? the first 10 players taken that June all played in the big leagues: Mike Moore, Joe Carter, Dick Schofield, Kevin McReynolds, Daryl Boston, and Ron Darling among them. Frank Viola was a second-rounder. Neal Heaton as well Tony Gwynn lasted until the third round..."

I'll post some others as the urge strikes me.

Evaluating a Player's Career "Range" (Part II)

My first attempt at evaluating a player's career "range" can be found in Part I. I had some reservations about the list that that initial attempt generated, so I approached it in a slightly different manner.

This is what I came up with on a second look:
So, I ran the numbers using 60% of Peak Avg as the basecamp level.

Just to recap: we're trying to measure a player's consistent "greatness". Our first try used a flat rate of 20 (or 27) Win Shares as the floor - or basecamp, in our mountain metaphor - of that greatness, and then summed up the total Win Shares earned above that 20 (or 27) in a single season. By doing that, we would be measuring both the breadth and height of a "mountain range" (a single player's career), and this would hopefully allow us to more logically quantify a player's career (to compare "consistently very-good-to-great for many years" players, like Hank Aaron, and "good-players-with-ridiculously-high-peaks" like Sandy Koufax).

That first run gave us some results that we weren't quite sure about. (talked about in a posting up the thread) The suggestion, then, was to use a varying basecamp number, based off of some percentage of the player's peak average. The thought behind this was that it would give us a measure of a player's consistency *in relation to his peak*. Players with a low peak (ie, players who never had a seasn better than, say, 18 WS) would naturally fall off the listing because they would never accrue big enough numbers to compete with the hall-of-fame caliber players.

You can see the list with the variable basecamp here:
http://wezen.net/stats/ws-seavar.txt

Did it work? Honestly, I don't think so. All of a sudden, we have Hank Aaron rated as the #1 player (up from #9 on the 20WS list); Babe Ruth drops down to #2 (from #1 on all other lists); Bonds goes from #2 to #8; Honus Wagner drops from #4 to #12; Mantle falls from #8 to #22; Walter Johnson falls from #15 to #28; Gary Sheffield climbs from #35 to #21; and Rickey Henderson and Paul Molitor break the top 30. Now, I like that Rickey shows up higher, but I don't like the way we got there.

Why do Ruth and Bonds and Wagner and Johnson and Mantle (among others) fall so hard in this ranking? Well, take a look at the following charts (forgive the lack of active links):

http://wezen.net/stats/charts/ws-ruth.jpg
http://wezen.net/stats/charts/ws-bonds.jpg
http://wezen.net/stats/charts/ws-aaron.jpg
http://wezen.net/stats/charts/ws-rickey.jpg
http://wezen.net/stats/charts/ws-molitor.jpg

For players like Bonds and Ruth, who have such high Peak Avg's, their new base camp (the orange line - based off of 60% of Peak) is much higher than the old basecamp (the cyan line - shown here at 25 WS). In short, those players are being punished for having a high peak average - the method here is ignoring their win shares in the 20-30 range, a range which would be completely counted for most other players on the list. Looking at the charts for Molitor and Rickey, you can see the that new basecamp (again, the orange line) is lower than the 25WS line and, in Molitor's case, much lower. They are getting credit in this ranking for Win Shares that are basically junk to someone like Bonds or Ruth.

Players like Aaron and Gehrig benefit the most from this ranking, it seems, and I think that's because their Peak Avg was such that their variable base camp gets calculated to right around 25 WS, placing them right in the middle of that too-high and too-low problem that Ruth and Molitor exemplify.

Yes, we are measuring their consistency and their consistency in relation to their peak. And this is valuable because it helps us determine who stayed at that level for a long time. But I don't think it works for ranking who was best across their career (I mean, is Paul Molitor really 13 spots better than Joe DiMaggio? and I'm a Milwaukee guy...).

In the end, we are trying to determine an "all star win share" value, like alljoeteam suggests, and I don't think it makes sense for us to define this value as a varying number. Somewhere, there should be a value we can agree on as "all star level" and use that.

[I did try running the numbers where the basecamp couldn't go higher than 25, so that players like Bonds and Ruth were using the 25WS line; it definitely changed the rankings, but ended up making them look too close to the 20/27WS runs I did previously].
I gave this process some further thought after originally posting this. I do think there is value in this process, but it's just not what I was originally looking for. In the end, we get a list of the most consistent quality players in baseball history. We get the Paul Molitor's and Hank Aaron's of the world, and that's a worthwhile list to have.

I did one last tweak of the list, weighting the variable basecamp values by the peak. The list that it generates gives us the best ranking of consistency, I think. It seems weird to see someone like Babe Ruth so far down the list, but it makes sense. As excellent as Ruth was, he didn't consistently reach those peak values (yes, his seasons were consistently better than, say, Paul Molitor, but they weren't all at that *same* level of excellence), so his showing up a little lower is fine.

Evaluating a Player's Career "Range" (Part I)

The following pair of posts grew out of a discussion on the Bill James Online forums. The original concept was proposed by Bill James user alljoeteam, and I took an initial stab at answering.

alljoeteam originally posted this idea:
But lets look at a players greatness as a mountain. How do we measure the greatness of the mountain? How high it is? How wide it is? We would need to combine those factors to determine the greatest of the mountain. Peak season are like the height of the mountain and career value is like the volume of the mountain. James also included a per 162 games factor in his rating. That is like a some average of the heights of the mountain at different points or something. What I am suggesting is a find the volume of the mountain that is above a certain point, say 2000 feet from the base or something. Different mountains will have a different percentage of their volumes above that cutoff.

A tall, steep mountain (McGwire) will have about the same volume as a plateau (Murray) depending on the exact numbers. Now a really high plateau (Gehrig)is both wide and tall and we could see that in this one number. He would rank above McGwire and Murray as he should, but McGwire and Murray would be about the same.
Here's my initial work on his idea:
I found this to be a pretty intriguing idea, actually. I understand the analogy you gave in the "Peak Seasons in Player Rankings" posting (about measuring the scope and breadth of a "mountain" above a certain "elevation"), and I agree with it in principle. So I looked at the numbers.

I ranked players in the two ways you proposed, looking at total Win Shares above a certain annual level and also at a weighted listing, using the average of each player's three peak seasons as the weight. I ran the numbers twice, using 20 WS and 27 WS as the two "base camps". I did that mostly to see if the 20 WS value was too low of a threshold.

You should be able to see the lists at:
http://wezen.net/stats/ws-seas20.txt
http://wezen.net/stats/ws-seas27.txt

My first thought upon looking at the list was that it seemed to favor older players (I ignored people who played their final game before 1901... I would've ignored all people who played before 1901, but I didn't think it was right to leave off Honus Wagner or Cy Young). However, that just may be me. I mean, Tris Speaker, Walter Johnson, Eddie Collins, Rogers Hornsby, Nap Lajoie, and Christy Mathewson aren't exactly slouches.

Both lists are definitely dominated by pre-1980 players, though, with only Barry Bonds (#2) and Mike Schmidt (~#17) showing up in the top 20 on every version of the list (A-Rod shows up in the top 22 of every list). The players in the top 25 who benefit the most from the weighted list are Lou Gehrig and Hank Aaron, both jumping 2-5 slots when the weights are taken into consideration. This should tell us that these two have the most (positive) consistency between their peak seasons and their "all-star WS" seasons. Surprisingly, the two players who are harmed the most by the weighted lists are Nap Lajoie and Mickey Mantle. They both drop 6-7 slots on the list when their weights are taken into account.

As for (very) recent players, the biggest surprises lie just outside the top 25. Albert Pujols and his 7 full seasons (these data are through 2007) is already in the top 30 (or so). Gary Sheffield, Jeff Bagwell, and Frank Thomas are all also in the top 40 (on all lists). In fact, Sheffield actually benefits from the weighted lists while Pujols drops. Bagwell and Thomas both hold their spots on each list.

Other players of note are players like Rickey Henderson, Pete Rose, and Joe DiMaggio, who are all in the top 30 on the 20 WS list but who drop into the 40s in the 27 WS list. To me, this seems to say that these players all have a large clump of 20-30 WS seasons, with only a few higher WS seasons.

In the end, I'm not sure what I think about this list. It seems to make sense, as the players that it has identified are obviously high-value, high-WS players. I think I just may be disappointed in the lack of diversity of the top players. I can't decide if this is an issue with the method, or if this is a limitation of Win Shares. I'm still new when it comes to playing with real, hard-core numbers, so I haven't been able to develop a full opinion on stats like Win Shares.
Continue on to Part II...

The reason for this blog

Recently, I decided to take the interest I've had in baseball statistics for years and years and actually do something about it. I've been playing around with the various statistics databases available online and other sources of baseball information (reference books, magazines, preview guides, websites like baseball-reference.com, etc.) and getting a taste for what the "scene" is like. I've also recently joined the Bill James Online website and been reading his posts and participating in his discussion boards.

My main reason for starting this blog, then, is to give myself a place to store all of these various thoughts I have - whether they are in emails to friends, or the result of some personal research, or as a post to some online forum - so that I can come back to them later and also let anyone else see them. A blogspot blog seemed like the obvious easy answer to this.

Because of this, I'll be putting up a number of posts tonight (and in the future) as I catch up with some of my writing from the recent past. Don't expect this kind of throughput in the future.

Wednesday, November 19, 2008

Albert Pujols vs. Frank Thomas

(originally posted on the Bill James Online forums)

This is something that I've been pondering since the start of this season, and now that Pujols was finally given his second MVP, I figured I might as well share the thought with others.

Before I go on, I should state that this comparison was probably most apt one year ago, after Pujols' 7th season, but it's too late for that now. Plus, him having his second MVP award does help the comparison in one way...

Frank Thomas' first full year in the majors was 1991. Albert Pujols' was 2001. So, in 1998 Thomas was finishing his 8th big league year and, in 2008, Pujols was finishing his 8th. At the time ten years ago, Frank Thomas had two MVP awards (won back-to-back) and was considered one of the best hitters of all time and a sure-fire hall of fame player. Albert Pujols, today, has two MVP awards and is considered by many to be the obvious best player in baseball, having had probably the most successful start to a career in a long time.

So, the question I wonder about is, is Albert Pujols today the equivalent of Frank Thomas from 10 years ago?

Before we take a look at the stats, I'll say that the reason I've always pondered this question is because of an article I read back then. In the 1998 Sporting News Baseball Yearbook, previewing the '98 season, there was an article asking whether Frank Thomas was the greatest hitter ever, and it compared his first 7 years to the likes of Joe DiMaggio, Mickey Mantle, Rogers Hornsby, and Al Simmons. The article went on to say that, of the 8 players they were comparing, 4 went on to have even more productive careers, and the other 4 declined quickly. "Which way would Thomas go?" was the end question. (You can see their comparisons here.)

We all know what happened after that. Injuries and attitude derailed the Big Hurt and made him a questionable commodity for a number of years, until he popped back up in Oakland and Toronto as a contributing DH. Can or will the same thing happen to Pujols? And, even if it did, would we think higher of him than we think of Thomas?

Here's a quick comparison of their respective numbers over their first eight years - and remember what I said, this comparison was more apt after 7 years, not 8, because that 8th year is when Thomas started declining.

Looking at the spreadsheet and seeing all that yellow in Pujols columns, it seems pretty easy to say that he had the better start to his career. But, if you look again, you'll notice that Thomas tends to lead in batting average, on base percentage and, most importantly, OPS+. Which, of course, means that those additional extra-base hits and home runs that Pujols has (and, thus, higher slugging), aren't as impressive when compared to the era he's playing in. But does that mean either one is better?

By the stats alone, I think it's easy to go one way or the other (though I tend to lean in Pujols' favor). However, we know that Pujols has a gold glove (and should have more, though we *must* ignore the "shoulds") and a world series ring. Thomas was, by almost all accounts, a lackluster defender, if that, and was never able to take his team to the series (he wasn't on the World Series winning roster in 2005).

I know that this isn't really a very constructive comparison, as there is nothing that anyone can do about the future. And I also know that Pujols could just as easily turn into Lou Gehrig (like in Dave Fleming's article [over] on Bill James Online) as he could turn into Frank Thomas. Still, I think most people feel pretty safe predicting that Pujols will be remembered as one of the best of all time because he's been performing at such a high level for so long, but it seems important to remember that we all felt the same way about Thomas only 10 years ago. And I think you'd be pretty hard-pressed to find someone claiming the Big Hurt is one of the all-time greats.

I imagine a lot of people will read this and bring up a lot of good points why Albert won't turn into Frank, and I will believe every single one of them. But I'll also believe all the arguments in the other direction, where one brings up Albert's scary brush with Tommy John surgery or something else. In either case, I'm rooting for Pujols because I want to say that I was able to watch the greatest first baseman of all-time play for many years. I like the way the guy plays, and I hope nothing but the best. I just think it's important to remember just how certain we were about Thomas back in the day too, and that didn't quite turn out as we hoped.