Expected Weighted On-base Average (xwOBA) is a fantastic metric that also happens to be a favorite of mine for evaluating hitters. It is a beautifully simple and efficient metric – it’s the same thing as wOBA, except instead of actual outcomes it uses estimated outcomes based on exit velocity and launch angle. Though xwOBA is a descriptive metric, it correlates to future wOBA better than wOBA itself, meaning it’s generally a better representation of true talent. However, until 2019 there was a major flaw with xwOBA that had not been addressed: wOBA for hitters with good speed was consistently higher than their respective xwOBA, while runners with poor speed typically experienced the opposite effect. That is, fast runners were outperforming their xwOBA, and slow runners were underperforming their xwOBA. This isn’t very surprising given how xwOBA is constructed – xwOBA measures the league average outcome for batted balls with the same exit velocity and launch angle. Intuitively, fast runners are going to beat the league-average outcomes for balls in play on a fairly regular basis. To put it another way, If Trea Turner and Yadier Molina both hit the exact same ground ball, Turner has a better chance of beating it out for a hit (duh), but they would both have the same expected outcome. You don’t need me to tell you that this was problematic, but I’m going to anyways: this was problematic. In 2018, Alex Chamberlain (@DolphHauldhagen) of Rotographs wrote this piece in which he quantified this effect for the 2018 season, and he found that the Pearson r correlation coefficients (0=no relationship and +1=perfect correlation) were +0.61 for wOBA – xwOBA and sprint speed on ground balls, and 0.47 for overall wOBA-xwOBA and sprint speed. Clearly, we needed some sort of adjustment for this in order to make xwOBA as effective as possible.
In 2019, xwOBA was updated to include sprint speed on batted balls that were registered as “topped” or “weakly hit”. Why adjust specifically for those types of batted balls? Because speed-driven xwOBA overperformance was mostly a result of infield singles, and we know that speed matters more on slowly-hit balls than it does on hard-hit balls. Even Diamondbacks legend Tim Locastro probably isn’t going to beat out a 101-mph ground ball to shortstop, but good luck trying to get him out on a slow roller or a high chopper.
To my knowledge, there is no publicly-available information on this topic since the 2019 update, so I decided to head over to Baseball Savant and figure it out for myself. My methodology was much like what Chamberlain did for the 2018 season, using the Pearson r coefficient to determine the correlation between xwOBA “overperformance” and sprint speed. I expanded the data to all seasons since 2015 to give us a larger set of data to look at. Thanks to the Statcast “search” feature, we can quickly acquire the datasets that we need for this mini-study, which were:
- wOBA-xwOBA (min. 500 PA) for each season from 2015-2019, n=706*
- Average sprint speed of players during the season(s) in which the exceeded the minimum ground ball and/or plate appearance requirements
*Note that these are individual player-seasons rather than cumulative numbers. Since sprint speed can change a great deal over the course of a five-year period, we want to download the data for each season individually and use the sprint speed given for the corresponding season. It’s a little extra legwork, but it’s worth it.
The first thing we’re going to look at is the direct relationship between xwOBA overperformance and sprint speed. For this, we’ll use wOBA-xwOBA, which is simply the difference between a hitter’s actual wOBA and their xwOBA. Our sample was collected by identifying all player-seasons with at least 500 plate appearances because, according to what Tom Tango was kind enough to inform me on Twitter (while also informing me that I was asking the wrong question!), xwOBA becomes reliable at around 500 PA. This cutoff gives us a sample population of 706 from the 2015-2019 seasons. Obviously, we want to make sure that our sample population is representative of the entire league, which is easy to do since we’re only using one independent variable (sprint speed):
- Average sprint speed of our sample population: 27.0 ft/s
- League-average sprint speed per Baseball savant: ~27.0 ft/s
The sprint speed of our sample population is right at league-average and our distribution is normal, so we can feel pretty confident that our sample is representative of the entire league. Now that we’ve covered our bases, let’s get to the good stuff:
The Pearson r correlation coefficient for our sample was +0.28, which is a positive relationship, but not an especially strong one, as you can see in the plot. If we use +0.47 (Chamberlain’s findings from the 2018 season) as our baseline, we can reasonably conclude that the 2019 xwOBA update was quite effective. We could stop here and call it a day, and we would’ve learned something. Alas, this IS NOT the endpoint of our study. In fact, this is where the real work begins: we have the data points and a better understanding of the relationship between two important variables, and now we have to take this data and make it useful. One thing we could do is simply look at the slope of our fit line above and say that, on average, there is a 4-point increase in wOBA-xwOBA for every 1 ft/s increase in sprint speed, and we would technically be correct but there is so much variance that this wouldn’t be particularly useful. Graphs are nice, but we can do more to get a better idea of what we’re looking at. We need benchmarks.
One of the most common uses of wOBA-xwOBA is to determine whether a hitter was “lucky” or “unlucky” in a given time period, usually a season. To my knowledge, there is no official cutoff point for good/bad luck in this context, but anything more than +/- .015ish is worth looking into. What happens if we group players by their respective sprint speed* and see what percentage of each group got “lucky” ((wOBA-xwOBA) > .015)?
*Generally speaking, I’m not a big fan of using bins to separate data since it allows numbers that are very close to each other to get lumped into different categories, but I believe it can be effective here and I want us to get several different looks at this data, so we’re going to roll with it.
|Sprint Speed Range (ft/s)
|Number of players (2015-2019, min. 500 PA)
|Number of players who outperformed xwOBA by >15 points
|% of players who outperformed xwOBA by >15 points
We can do better than this, but it’s a decent starting point and it gives us an idea of where the speed-driven xwOBA overperformance begins to plateau (~27 ft/s). Obviously, our sample size decreases as we move further away from the average sprint speed but we clearly see a steady increase in the percentage of “lucky” hitters until we reach the 50th percentile sprint speed (~27 ft/s), at which point the benefit of speed appears to taper off. Roughly one-third of all hitters in our sample population with a sprint speed that was average or better outperformed their xwOBA by more than 15 points (.015). That’s certainly notable, but more like “take this with a grain of salt” than “hitter X wasn’t lucky and his xwOBA overperformance is definitely sustainable because he can run fast.”
So, we know how often players outperform their xwOBA by a sizeable margin based on their sprint speed. But we don’t know the degree to which each group outperforms their xwOBA on average. We can change our query to find that information pretty quickly, though.
|Sprint speed (ft/s)
|Number of players (2015-2019, min. 500 PA)
This is an interesting look. The fit line on our plot above tells us that, on average, we would see an increase in wOBA-xwOBA of .004 for every ft/s of sprint speed, but our table here shows us that the increase is more stepwise than linear. Just to confirm this, I lowered the qualifier to a minimum of 400 PA, expanding our sample population to n=1,050 and found more or less the same results. So, if you absolutely need a rule of thumb for sprint speed and xwOBA overperformance, you could say that, on average, there’s about an 8-point increase in wOBA-xwOBA on average for every 2 ft/s increase in sprint speed, tapering off at 29+ ft/s – though I wouldn’t recommend using that. Of course, for anything this high-variance you’re always going to be better off looking at each player on a case-by-case basis.
So what did we learn from this?
A few things:
- The xwOBA update from 2019 appears to have accomplished its goal of reducing the sprint speed bias.
- There is still a relationship between sprint speed and xwOBA over- and underperformance but it’s a fairly weak relationship and, again, this is appears to be a significant improvement from the pre-updated xwOBA numbers. We don’t know exactly how much of an overall improvement it is, but based on what we know from the 2018 season it’s quite a lot.
- There will still be outliers. Dee Gordon outperforming his xwOBA by 61 points in 2017 is not just a total product of random variation. Gordon has absolute wheels and roughly the bat speed of a butterfly flapping its wings, so he probably beat out quite a few ground balls for infield singles and stretched a few extra-base hits.
Obviously, xwOBA overperformance only on batted-ball outcomes, so a more useful tool for determining just how “lucky” a player was is xwOBA on contact, or xwOBAcon (yes, it’s really called that). I’m going to revisit that in my next piece, so be sure to check back in the future if that’s something you’re interested in!
All stats and data used in this article are courtesy Baseball Savant unless otherwise noted.