An update to expected swing and miss%+
During last year’s All-Star break, I introduced x_swing_and_miss_percent_plus, a metric whose goal was “… to quantify how a good pitcher’s stuff is — through the lens of inducing swings and misses — without directly considering that pitcher’s control and command.” Fittingly, that is actually the last article that I wrote. In this one, I will do my best to break down the updated expected swing and miss percent plus (expected swing and miss%+). As one might expect, the two versions share many elements. To name a couple, both don’t directly account for interactions between pitch types, as I want the expected values to be based off of the pitch alone, and to keep things relatively simple and differentiate myself from the brilliant Cameron Grove (PitchingBot – An Overview) and Ethan Moore (xRV 3: The Final Update), I treasure expected swings and misses as opposed to expected run value.
My new version of expected swing and miss%+ differs from the old one in meaningful ways. A key contrast pertains to the groups of the models. Previously, I created six “pitch types” using k-means clustering based off of z-scores of pitch characteristics, such as velocity, vertical movement, and spin rate. My thinking was that behavior of the pitches in the six groups would be more similar to each other than it is in the pitch types provided by Statcast. Once I had formed six groups, I generated logistic regressions to predict the probability of a swing and miss based off of raw pitch traits. For the new-look expected swing and miss%+, I decided to construct logistic regressions on Statcast pitch types.
- 4-seam fastball (FF)
- sinker (SI)
- cutter (FC)
- slider (SL)
- curveball (CU) (and knuckle curve [KC] and slow curve [CS])
- changeup (CH)
- split-finger (FS)
In doing so, I sense that the statistic is more interesting and understandable. For instance, one can rank the 4-seam fastballs with the highest expected swing and miss percentages. In the old model, over 10 percent of pitches in five out of the six clusters were, confusingly, 4-seam fastballs.
Statcast recently chose to (re)classify some pitches as sweepers. I will create a model for them in the offseason. I sweep the 2023 sweepers into the trash, as their absence on Pitcher, the shiny application dedicated to expected swing and miss percent plus and its building blocks, suggests.
Another important change is that I created two models for each pitch type rather than one. It used be that the output of the regression was the probability of a swing and miss. Now, the first model spits out the probability of a swing, the second the probability of a swing and miss given that the batter swung. To arrive at the expected swing and miss probability, I simply multiply the expected swing probability by the expected swing and miss probability given a swing. After doubling the number of models and complicating the statistic, I can compute more metrics, including expected swing percent (plus) and expected whiff percent (plus), and learn, hopefully, additional information about the pitcher.
A third noteworthy distinction is that the new expected swing and miss percent plus models factor in pitch location, as I believe that location is too integral to ignore. When I’m watching baseball, my judgment on the quality of a pitch is heavily based upon its location. Where a pitch lands obviously plays a huge role in whether the batter swings, and then misses. There are certain pitches that a batter, expect maybe Javier Báez, will never swing at.
This changeup had an expected swing and miss percent of around 0.00000000027. That was the least likely pitch to result in a swing last season.
On the other hand, the below pitch had an expected swing percent of roughly 97, the highest last season.
For each model, I tried a number of variables on the data I obtained through the baseballr package.
- p_throws (pitcher hand)
- release_pos_x_adjusted (horizontal release position adjusted for pitcher hand and batter hand)
- release_pos_z (vertical release position)
- sqrt(release_pos_x^2+release_pos_z^2) (“hypotenuse” of release_pos_x and release_pos_z)
- release extension
- release_speed
- release_spin_rate
- release_spin_rate/release_speed (bauer units)
- pfx_x_adjusted (horizontal movement)
- pfx_z (vertical movement)
- sqrt(pfx_x^2+pfx_z^2)
- vx0_adjusted (horizontal velocity)
- vy0 (velocity in y-direction)
- vz0 (velocity in z-direction)
- sqrt(vx0^2+vy0^2+vz0^2)
- ax_adjusted (horizontal acceleration)
- ay (acceleration in y-direction)
- az (acceleration in z-direction)
- sqrt(ax^2+ay^2+az^2)
- plate_x_adjusted (horizontal position of pitch at plate)
- plate_z (vertical position of pitch at plate)
- sqrt(plate_x^2+(plate_z-2.5)^2) (distance from center of plate)
- is_in_zone (pitch is in [strike] zone)
I also experimented with absolute terms and terms to different powers.
Here was the least likely 4-seam fastball to result in a swing and miss from last season…
Most likely…
Least likely sinker…
Most likely…
Yes, the most likely sinker to generate a swing and miss didn’t do such, but the expected swing and miss percent on the pitch is about 46, less than 50, so more often than not, one would expect there to not be a swing and miss. That was the case.
Least likely cutter…
Most likely…
The expected swing and miss percent on this cutter is a little less than 40, but a swing and miss still took place.
Least likely slider…
Most likely…
Somehow, the expected swing and miss percent on this slider is slightly over 60 with an expected swing percent of around 68. Those are surely poor estimates. No model is perfect.
Least likely curveball…
Most likely…
The expected swing and miss percent on this curveball is roughly 63, which feels inflated, but I like this one a lot better than most likely slider swing and miss estimate. I’m guessing Hays swings and misses on this curveball in a two-strike count, but here the count was 0-0.
Count is not factored into the models because I want pitches that have identical attributes to have matching predictions.
Least likely changeup should be familiar…
Most likely…
This changeup had an expected swing and miss percent of about 58, which seems a tad high, again, but by my eyes, it is a great pitch,
Least likely split-finger…
Most likely…
The probability estimate for this split-finger is close to two-thirds. I’m tempted to say that it’s too high, but the pitch strikes me as devastating.
Here is a graph that examines the accuracy of the expected swing and miss percentages for 4-seam fastballs thrown last season…
This was out of the sample that the models were built from (2015-21 regular seasons).
Sinkers…
Cutters…
Sliders…
Curveballs…
Changeups…
Split-fingers…
It is my impression that the graphs are reasonable. The models aren’t perfect, though they appear to do a solid job of setting expectations.
To determine a pitcher’s expected swing percent on a particular pitch type, I simply take the mean of that pitcher’s expected swing probabilities on the pitch type of interest and multiply by 100. I do the same for expected swing and miss percent. Once I have those two numbers, I can divide the latter by the former and by multiply by 100 to get an expected whiff percent. I adjust these values for league average by factoring in season, pitch group (fastballs [FF, SI, and FC], breaking [SL and CU], and offspeed [CH and FS]), count, home/away, and home team. The adjustments for the home team are regressed to the mean.
Here is a graph that compares expected swing and miss percent to swing and miss percent for pitchers’ 4-seam fastballs from 2015 to 2021…
2022…
Sinkers…
Cutters…
Sliders…
Curveballs…
Changeups…
Split-fingers…
Overall (FF, SI, FC, SL, CU, CH, FS)…
Generally, pitches with higher expected swing and miss percentages are associated with higher swing and miss percentages, which is desired.
Here is a graph that portrays some correlations from one season to the next for expected swing and miss percent and swing and miss percent for 4-seam fastballs…
Sinkers…
Cutters…
Sliders…
Curveballs…
Changeups…
Split-fingers…
Overall (all pitches)…
While the expected swing and miss percentages are not more predictive of future swing and miss percentages, the expected swing and miss percentages are clearly more stable year-to-year and taking them into consideration allows one to better predict future swing and miss percentages. Both had a p-value of <2e-16 when included in a regression together.
Here are the top 4-seam fastball pitcher-seasons by expected swing and miss%+ from 2015-22 (minimum 100 pitches)…
- 2016 Aroldis Chapman
- 2015 Aroldis Chapman
- 2017 Aroldis Chapman
- 2018 Ray Black
- 2017 Josh Fields
- 2015 Josh Fields
- 2017 Craig Kimbrel
- 2017 Walker Buehler
- 2022 Adam Cimber
- 2017 Brandon Morrow
Sinker…
- 2017 Josh Hader
- 2017 Buddy Baumann
- 2019 Josh Hader
- 2018 Josh Hader
- 2021 Josh Hader
- 2016 Garrett Richards
- 2015 Danny Duffy
- 2019 Alex Claudio
- 2020 Josh Hader
- 2022 Josh Hader
Cutter…
- 2019 Steven Brault
- 2015 John Lackey
- 2017 Matt Bush
- 2016 Tyler Chatwood
- 2015 Bryan Morris
- 2017 A.J. Minter
- 2018 Steven Brault
- 2018 A.J. Minter
- 2018 Trevor Bauer
- 2019 Emmanuel Clase
Slider…
- 2017 Garrett Richards
- 2021 Jacob deGrom
- 2019 Tanner Scott
- 2018 Ryan Pressly
- 2018 Blake Treinen
- 2022 Jacob deGrom
- 2016 Garrett Richards
- 2020 Zack Wheeler
- 2021 Zac Gallen
- 2019 Austin Pruitt
Curveball…
- 2015 Cody Anderson
- 2015 Craig Kimbrel
- 2018 Craig Kimbrel
- 2022 Craig Kimbrel
- 2019 Craig Kimbrel
- 2021 Craig Kimbrel
- 2022 Jonathan Loáisiga
- 2016 Craig Kimbrel
- 2021 Robbie Ray
- 2017 Alex Wood
Changeup…
- 2022 Scott Effross
- 2017 Donnie Hart
- 2022 Devin Williams
- 2020 Devin Williams
- 2019 Evan Marshall
- 2022 Wandy Peralta
- 2019 Brad Brach
- 2020 Pablo López
- 2019 Jeremy Hellickson
- 2021 Devin Williams
Split-finger…
- 2021 Aroldis Chapman
- 2017 Jeremy Jeffress
- 2018 Chasen Shreve
- 2022 Aroldis Chapman
- 2022 Chasen Shreve
- 2022 Ryne Stanek
- 2021 Hirokazu Sawamura
- 2020 Kevin Gausman
- 2020 Héctor Neris
- 2017 Chasen Shreve
Overall (minimum 500 pitches)…
- 2016 Aroldis Chapman
- 2015 Aroldis Chapman
- 2017 Aroldis Chapman
- 2016 Sean Doolittle
- 2017 Josh Fields
- 2017 Josh Hader
- 2017 Craig Kimbrel
- 2018 Aroldis Chapman
- 2017 Brandon Morrow
- 2018 Craig Kimbrel
Minimum 2000 pitches…
- 2022 Spencer Strider
- 2019 Gerrit Cole
- 2022 Drew Rasmussen
- 2015 Garrett Richards
- 2019 Jacob deGrom
- 2018 Luis Severino
- 2019 Walker Buehler
- 2018 Jacob deGrom
- 2021 Gerrit Cole
- 2021 Corbin Burnes
All in all, the lists look pretty good to me.
I thought it might be cool to look at which pitcher-seasons from 2015 to 2021 saw the biggest gains/losses in expected swing and miss percent going from the old models to the new the ones.
Generally, the two agree.
The biggest gainers (minimum 1000 pitches)…
- 2021 Aroldis Chapman (12.4 –> 18.8)
- 2016 Noah Syndergaard (10.0 –> 14.1)
- 2021 Emmanuel Clase (13.8 –> 17.6)
- 2021 Jacob deGrom (17.1 –> 20.8)
- 2021 Shane McClanahan (12.9 –> 16.4)
- 2021 Corbin Burnes (13.1 –> 16.5)
- 2021 Pablo López (11.2 –> 14.5)
- 2020 Jacob deGrom (16.1 –> 19.4)
- 2019 Eduardo Rodriguez (9.1 –> 12.4)
- 2021 Mark Melancon (10.5 –> 13.7)
Losers…
- 2021 Sergio Romo (17.8 –> 12.6)
- 2017 Bronson Arroyo (12.8 –> 8.3)
- 2015 Mark Buehrle (13.3 –> 9.2)
- 2015 Doug Fister (10.2 –> 6.4)
- 2018 Sergio Romo (18.1 –> 14.4)
- 2016 Doug Fister (10.2 –> 6.7)
- 2017 Kyle Hendricks (12.0 –> 8.7)
- 2018 Tyson Ross (13.8 –> 10.7)
- 2018 Bartolo Colon (9.1 –> 6.1)
- 2017 Chris Devenski (17.3 –> 14.3)
Most of the gainers throw hard. The losers don’t and many of them are thought of as “command guys”.
Here’s a graph of expected swing and miss percent versus Stuff+ for 2021…
Lastly, here are the leaders in expected swing and miss%+ for this season (minimum 250 pitches)…
- Jacob deGrom
- Josh Hader
- Drew Rasmussen
- A.J. Minter
- Emmanuel Clase
- Félix Bautista
- Spencer Strider
- Emilio Pagán
- Camilo Doval
- Reynaldo López
You can see full leaderboards on this app.
Thanks for reading!
All GIF material stems from Baseball Savant, the featured image Brett Davis-USA TODAY Sports