An introduction to pK%+ (Pitchers)
A little bit over one month ago, I released an article introducing a new hitting metric called pDRC+ (predictive deserved runs created plus). The stat, which incorporated barrels/PA, K%, BB%, % of BBEs that are ground balls or pop ups that leave the player’s bat at an EV of less than 90 mph, average EV, and sprint speed, was weighted in an effort to make it predictive of how a player will perform in the future (more so than DRC+, OPS+, and wRC+).
The metric I plan to introduce today is for pitchers.
It is called pK%+ (predictive strikeout rate plus).
The stat is similar to pDRC+ in the sense that the metric’s predictiveness is maximized over its descriptiveness.
pK%+ takes into account numerous variables:
- Swing% in heart zone
- Whiff% in heart zone
- Swing% in shadow zone
- Whiff% in shadow zone
- Whiff% in chase zone
- % of non-whiff swings ending in foul balls (swings)
- Average fastball velocity
*It removes intentional walks from the picture.
All seven factors were converted into Z-scores and subsequently combined into one Z-score.
The formula for pK%+ is…
Combined Z-score = (Swing% in heart zone Z-score * 0.11) + (Whiff% in heart zone Z-score * 0.145) + (Swing% in shadow zone Z-score * 0.04) + (Swing% in shadow zone Z-score * 0.22) + (Whiff% in chase zone Z-score * 0.14) + (% of non-whiff swings ending in foul balls Z-score * 0.215) + (Average fastball velocity Z-score * 0.135)
pK%+ = (Combined Z-score * 36.923) + 102.06
At this point, similar to the way that I did for pDRC+, I will discuss the variables individually.
*note that I may refer to pK%+ and combined Z-score interchangeably (pK%+ is equal to the product of 36.923 and each player’s combined Z-score plus 102.06)
Swing% in heart zone
The graphic aside this text depicts the four attack zones of the plate. It was Baseball Savant’s creation.
Pitches taken in the heart zone are called strikes about 99% of the time.
About 45% of swings in this zone (2019) result in the ball being put into play (ending the plate appearances).
Roughly 12% of pitches in the heart zone end in whiffs.
In the pK%+ equation, a high percentage of swings in the heart zone is a negative (note that you don’t see any subtraction or negative signs in the formula because I adjusted all Z-scores by multiplying by dividing by a negative when necessary so that it would easy to see whether a variable is boosting/lowering a pitcher’s combined Z-score/pK%+ when looking at the spreadsheet/table).
Alone, swing% in the heart zone has no correlation to strikeout rate during a season, as well as future strikeout rate; however, it improves the model when combined with other factors.
Whiff% in heart zone
Percentage of swings in the heart zone that result in whiffs is correlated strongly to strikeout rate plus, as can be seen below.
Of the four attack zones, it is toughest to induce whiffs in the heart zone.
One player who has always dominated relative to his peers in terms of whiffs in this zone is Josh Hader.
In 2018, hitters whiffed on 32.7% of their swings in the heart zone against Hader, a mark that was over 4 standard deviations above the league average of 14.4%.
A higher whiff% in the heart zone tends to be associated with a higher strikeout rate plus.
Swing% in shadow zone
This is the variable that has the smallest impact on the equation.
As could be said about swing% in the heart zone, knowing the swing% in the shadow zone doesn’t reveal much about a pitcher’s strikeout tendencies on its own.
When combined with everything else, though, it can be of some benefit.
Adding it to the equation boosted the adjusted R^2 (tells one if adding a variable really improves a model [not just by chance!]), and like swing% in the heart zone, a higher percentage of swings in the shadow zone is a negative in the equation (they could end a PA… in which case, no strikeout).
Whiff% in shadow zone
Whiff% in the shadow zone more closely mirrors strikeout rate than whiff% in the heart zone. A likely reason why this may be the case is because almost half of all swings and misses take place in this region.
The highest Whiff% in shadow zone Z-score recorded by a pitcher in a single-season (min. 250 batters faced) since 2017 is 3.10, and that was by Josh Hader in 2019.
The second highest is 3.08 standard deviations above average (Craig Kimbrel in 2017), and third highest is 3.07 (Josh James in 2019).
Whiff% in chase zone
The fifth variable that pK%+ encompasses is the percentage of swings in the chase zone that result in whiffs.
This stat does not correlate as strongly to K%+ (in season n and season n+1) as do whiff% in the heart zone and whiff% in the shadow zone.
Despite that, whiff% in this zone remains insightful because it communicates how often a hitter swings and misses (when they swing) in a zone where whiffs are abundant (hence the name chase zone).
In 2019, hitters whiffed at slightly over half of the pitches in the chase zone that they swung at.
In 2017, Pedro Strop posted a whiff% in the chase zone that was 3.46 standard deviations above league average (Strop: 86.0% | lgavg: 49.6%).
The second highest Z-score for whiff% in the chase zone was 3.08 (Ryan Pressly in 2018).
% of non-whiff swings ending in foul balls
The percentage of swings that the hitter made contact on that are foul balls is a key component of pK%+.
Simply put, the complement of foul ball swings is swings that are put into play. A foul ball extends a plate appearance, and it is a strike!
Average fastball velocity
The final variable that pK%+ accounts for is average fastball velocity.
All types of fastballs are included: 4-seamers, 2-seamers, cutters, and sinkers.
In the perefect world, I’d be able to use average 4-seam fastball velocity, but some pitchers don’t throw a 4-seamer, and I wouldn’t want to discard that data.
Knowing how fast the average fastball left a pitcher’s hand explained nearly 20% of the variation in a pitcher’s strikeout rate plus in season n+1 (approximately the same correlation as whiff% in the chase zone).
Not all fast pitches generate a high percentage of whiffs, but if all things are held constant, as velocity increases, whiff% also increases. A 2800 RPM 4-seamer with 5 inches of vertical drop and 95 mph velocity is better than a 2800 RPM 4-seamer with 5 inches of vertical drop and 90 mph velocity.
I determined how to weight each variable by using a multi-variable regression on a free site (http://vassarstats.net/multU.html) of pitchers who faced at least 250 batters in 2017 and 2018.
The R^2 for the combined Z-score in season n to K%+ in season n+1 is about .074 higher (over 16% higher) than the R^2 for K%+ in season n to K%+ in season n+1.
Next, I tested my coefficients/weights out of sample on the 2018-19 consecutive player-seasons of 250+ TBF.
As you can see, my metric had a correlation that was a little bit stronger than just looking at K%+ to K%+.
While I wish the gap in correlation in using my metric over K%+ would’ve been larger for the out-of-sample testing, I am nonetheless pleased that my metric outperformed K%+ (albeit to a minimal extent). There is a chance that the K%+ correlation was higher than usual due to randomness.
One really encouraging sign for pK%+ is how well it performed when there was a big gap between it and K%+ in season n.
In 2018, the average difference between pK%+ and K%+ was 8 points.
For the 22 players for which there was at least a 15 point difference between the two, 2018 pK%+ did a better job of predicting 2019 K%+ than 2018 K%+. The 2018 pK%+s for these players were on average ~15 points away from their 2019 K%+s. For K%+, it was about ~21 points (~18 if you were to create a regression equation off of the 2017-18 data where the x-variable is K%+ in season n and the y-variable is K%+ in season n+1).
Here is a screenshot of those players of interest from 2018 (the rightmost K%+ is from 2019)…
To be clear, the pK%+ values were generated from when I regressed the combined Z-scores against K%+ in season n. I choose to do season n rather than season n+1 because that allows for a wider range of outputs and makes pK%+ a tiny bit closer to K%+ in season n in terms of point difference without sacrificing much of any predictive power.
pK%+ has a very strong correlation to K%+ (.917). It is not as strong as Mike Podhorzer’s xK% metric, which has an undeniably impressive .93 adjusted R^2. That shouldn’t come as surprise though, as one must consider that I built this model maximizing predictiveness.
Even so, a difference in pK%+ and K%+ in season n could still be attributed to (un)luckiness.
The year-to-year stability of my metric is pretty strong.
My metric appears to do slightly better than K%+ when the sample size is notably smaller (less than or equal to 300 TBF).
Before anybody asks, there are two reasons why I didn’t gather data prior to 2017. It would have taken a lot more time, and through 2016, pitchers had to actually intentionally walk batters (removing that pitch data would be a pain).
After the 2020 season (if there is one), I might look to update this formula (to have the weights be determined by 2017-18 and 2018-19 consecutive player-seasons). The refinements would presumably be minor.
With that being said, I am very happy with the current status of this metric.
Here are the pK%+ leaders from 2019 (LgAvg is around 102)
- Josh Hader (174)
- Josh James (170)
- Nick Anderson (168)
- Edwin Diaz (165)
- Gerrit Cole (164)
- Matt Barnes (154)
- Jake Diekman (154)
- Brandon Workman (150)
- Liam Hendriks (149)
- Jose Leclerc (147)
Pitchers w/ the biggest positive difference between pK%+ and K%+ from 2019
(pK%/K%+)
- Gregory Soto (99/71)
- John Gant (122/97)
- Jake Diekman (154/130)
- Matt Harvey (87/64)
- Jeurys Familia (123/101)
“Smallest” negative difference from 2019
(pK%+/K%+)
- Josh Hader (174/209)
- Will Smith (134/163)
- Yusmeiro Petit (74/100)
- Taylor Rogers (119/141)
- Jesse Chavez (72/93)
Highest pK%+ single-seasons since 2017
- 2017 Craig Kimbrel (204)
- 2017 Dellin Betances (188)
- 2018 Josh Hader (184)
- 2018 Edwin Diaz (182)
- 2018 Dellin Betances (182)
- 2017 Corey Knebel (176)
- 2019 Josh Hader (174)
- 2019 Josh James (170)
- 2019 Nick Anderson (168)
- 2017 Chad Green (165)
The 2019 pK%+ leaderboard can be accessed here.
……………
In these tough COVID-19 times, it is my goal to make a difference.
As a Detroit Tigers fan who has been to what feels like over 200 games at Comerica Park, I have an emotional connection to the city of Detroit, and I’ve always felt sympathy towards those who are homeless.
When I was on my 8th grade field trip to Washington D.C. back in 2017, I gave a homeless man $20. It brought me a lot of joy to help somebody out, and he seemed to appreciate it immensely.
If you’d like a digital copy of my spreadsheet (I can email it to you in whatever form you would like [Numbers, Excel, etc.]), venmo me a $5 (or higher) donation (my venmo is MaxSportsStudio), and once you have done that, DM me on Twitter (@MaxSportsStudio).
The spreadsheet consists of five sheets: 2019, 2018, 2017, single-seasons (2017-19), and n+1 (2017-18 and 2018-19).
All the money you donate will go towards an organization called Heart 2 Hart Detroit.
In January of this year, they handed out…
- “1,200 lunches
- 10 winter coats
- 160 pairs of underwear/thermal underwear
- 470 pairs of socks
- 750 hygiene products (deodorant, razors, shampoo, soap, toothbrush, toothpaste, wet wipes etc.)
- 25 pairs of shoes
- 210 shirts, sweaters or hoodies
- 125 bus tickets (4 hour rides)
- 5 bus passes (monthly)”
My goal is to raise at least $500. I will keep you all posted on Twitter.
Thanks for reading. Stay safe, everyone.
*note: I noticed the weights for the combined Z-score formula add up to 1.005 (that was a super small mistake), but I’m simply leaving it that way for now because changing everything so it added up to 1 wouldn’t make much of a difference.