Tuesday, July 23, 2024

What If Ted Williams and Joe DiMaggio played their in primes?

Have you ever wondered what if? What if she answered yes to a date? What if I took school more seriously? What if I accepted that job offer? Life is a decision tree, and it’s hard not to wonder what a different node in your life’s tree might look like had you made a different decision. What if questions are a normal part of human existence, and today, I want to provide an analytical thought experiment of what if the greatest hitters in baseball history played in their primes. 

When you think of the greatest baseball players to ever lace them up, who do you think about? Babe Ruth? Lou Gehrig? Ty Cobb? Willie Mays? Mickey Mantle? Hank Aaron? Mike Trout? Barry Bonds? Stan Musial? Walter Johnson? The list goes on, they’re all great choices, and I wouldn’t fault you for picking any of them. But this conversation always has me feeling nostalgic to relive two players’ primes that were stolen from us, Ted Williams and Joe Dimaggio. What if they played their primes? 

You might know DiMaggio best for his unbreakable hitting streak of 56 games set back in 1941, and completely captivating a nation. You might know Williams for being the last player to hit above 0.400 or having the highest career OBP of all time. I remember these two players as the stars of their respective teams in the most coveted rivalry in sports history. I usually remember DiMaggio for unrightfully stealing multiple MVP awards from a more deserving Williams, but that’s a story for another day. 

Nothing in life is guaranteed, and certainly nothing in a player’s baseball career. Between injuries, skill degradation, or personal issues, many baseball careers get tragically derailed or end early. Serving in your country’s military during the most deadly conflict in human history isn’t your typical reason for spending less time playing baseball. But that’s exactly the reality that Ted Williams and Joe Dimaggio faced, both of whom served in the United States Military for 3 years in the middle of the primes of their careers.

DiMaggio started his career in 1936 at 21 years old for the New York Yankees and took the league by storm, smashing 29 home runs with a 122 wRC+ in his rookie campaign. Williams started his career in 1939 at 20 years old for the Boston Red Sox and also hit the ground running, hitting 30 bombs with a 155 wRC+. With these sorts of numbers, both sluggers were poised for success. In fact, between their rookie seasons and their last season before going off to war in 1942, DiMaggio posted a 154 wRC+ and Williams led the MLB with a 183 wRC+. 

Just from the eyeball test, you can see that Williams and DiMaggio have crazy wRC+ numbers. But what’s jarring to me is the time they missed. Subjectively, Ted Williams was a better hitter. But by how much? 

Quantitatively, Ted was a statistically significantly better hitter. Both are some of the greatest hitters of all time, and are duly compared, but Ted is on a whole other level. 

What would Williams and DiMaggio’s career stats look like if they played in those three seasons (yellow blocks) that they were in the military? This is a super tricky problem to model, it’s not necessarily a textbook forecasting problem because A) we have very few data points, and B) it’s in the middle of their careers, not projecting future outcomes. 

Alongside their performance, aging-performance curves play a critical part in projecting player outcomes. Basically, when you come into the league, you start with modest production, peak around age 26-29, and then gradually fall off. This phenomenon has been talked about a lot in the baseball space. Mitchel Lichtman wrote about aging curves here. I plotted his data, and fit a 3rd degree polynomial to it. This is based on large data sets and likely depends on not only the era of baseball but also the individual player. For instance, the best hitters of all time may deviate from the population data more than others, i.e. GOATS age differently. Tom Tango has more information on his site about aging curves if you’re interested

Despite being noteworthy enough to talk about, I didn’t use this polynomial to interpolate Williams’ or DiMaggio’s stats. I used something called spline models. Splines are just piecewise polynomials, which are bits of polynomials for each data point. These splines make it really easy to model data and also avoid some common polynomial pitfalls like Runge’s phenomenon. Here, I used splines to model what Ted William’s and Joe DiMaggio’s numbers would be in the seasons.

Green (Williams) and Salmon (DiMaggio) are their real stats, and golden are the simulated data based on my spline model. As you can see, Williams missed time when he was trending upwards, or in other words, before the prime of his aging curve, whereas DiMaggio missed time when he was trending downwards on the aging curve. I think my model is a bit conservative on Williams’ numbers, but sets a conservative baseline. My intuition tells me Williams could have had three of the best seasons of his career in the three seasons he missed, but my model isn’t as bullish.

Now that we have their missing data model, how do they stack up against their peers with the simulated data? 


Home Runs:




As you can see, these two players accumulating stats over these 3 years clearly makes them go from some of the most elite players to the most elite players, particularly Williams who might just be the best hitter ever with the addition of these simulated stats. Here’s a breakdown of how many more stats they got and how many places in the leaderboard it shot them up. 

There’s no perfect model and these predictions of those three WW2 years aren’t perfect, but don’t take them at face value. What I hope this thought experiment resulted in is that you appreciate A) just how good these two hitters are, and B) what could have been. 

About the author:

My name’s Dr. Shane Simon, and I got my PhD studying circuit-level neuroscience. Telling stories with data is my true passion. I’m currently a Senior Data Scientist, and I love applying some of my quantitative skills to tough baseball analytical questions in my free time. 

Author’s notes:

  • I didn’t include Johnny Mize’s story here, which is a shame. He had one hell of a baseball career and his career was also affected by serving in his country’s military for 3 seasons in his prime. 
  • I also tried Monte Carlo simulations and fitting the polynomial to interpolate the missing data from Williams and DiMaggio’s war years, but the results didn’t make the most intuitive sense to me, so I opted for splines. 
  • I’m using the metric called wRC+ (weighted runs created +). One of the reasons I like using it as an all-encompassing offensive metric is that it compares your skills versus your peers in that year with 100 being the league average, and it accounts for ballpark factors and era effects. 
  • Williams missed even more time due to the Korean war, but I didn’t model those. Just note that his career numbers should be even better! Also, his small sample sizes in those years could have unjudiciously inflated his wRC+ numbers a bit.
  • If you’d like to check out any of my code, check out my github.