Explaining xPitching+
The Home Run Derby was on Monday, the All-Star Game on Tuesday. It is Thursday, and there is no baseball to be played today.
Back in May of 2023, I published “An update to expected swing and miss%+” . Early in the season, I worked extremely hard to develop a better metric, restarting again and again. My new stat, xPitching+, is not perfect, but it is valuable.
In FanGraphs’ “Stuff+, Location+, and Pitching+ Primer” , Owen McGrattan writes that “Pitching+ is one of three models that, along with Stuff+ and Location+, attempts to look at the process underlying a pitcher’s performance in order to remove some of the noise that can be present when looking at on-field results. Eno Sarris and Max Bay created Pitching+, with inspiration from work by Ethan Moore, Harry Pavlidis, and Jeremy Greenhouse, among others.” Simply put, Pitching+ evaluates pitchers by summarizing the quality of all of the pitches they threw.
Importantly, the metric is fairly stable over time, and it is effective at predicting how successful a pitcher will be in the future, compared to other metrics.
My stat, xPitching+, is what I expect a pitcher’s (FanGraphs) Pitching+ to be based off of four measures:
- expected swing percent plus,
- expected swing and miss percent plus,
- expected called strike percent plus, and
- expected barrels per expected contact swing plus.
They stem from different models I built.
The first model is a multinomial logistic regression. It predicts the probability that a pitch type is a four-seam fastball, sinker, cutter, changeup/splitter, curveball, slider, or sweeper given…
- how fast the pitch type is compared to the pitcher’s average fastball,
- how much the pitch type moves horizontally (pfx_x),
- how much the pitch type moves vertically (compared to how much it would due to only gravity) (pfx_z),
- how much the pitch type moves, and
- the pitch type’s “angle” of movement.

I designate the “stat pitch type” as the pitch type with the greatest probability.
Pitch types that were thrown at least 100 times in a season agreed with the stat pitch type almost 90 percent of the time.
The bullets above were inspired by Baseball Savant’s Movement Profiles.

For example, Luis Castillo’s 4-Seam might be a four-seam fastball, but if it acts more like a sinker, then it makes sense for the models to treat it as a sinker.
The next models are generalized logistic regressions that predict the probability that a pitch will be called a strike in the case that the batter does not swing.
The idea for those is that a pitch from one of the seven classes ends up in some spot (relative to the hitter and strike zone). What is the likelihood that it will be called a strike, if the batter takes it?

The variables I tried out were…
- plate_x (“Horizontal position of the ball when it crosses home plate from the catcher’s perspective”),
- is_in (is pitch inside [relative to middle and hitter]?),
- plate_z (vertical position of the ball…),
- is_above_middle (of zone),
- is_in_zone,
- middle_angle (angle that goes from middle-middle to where pitch ends up),
- middle_angle_distance_from_quarter_pi, and
- distance_from_middle.
I adjusted these for batter handedness and the height of the zone, and I added some absolute value, squared, and interaction terms.
Here is a graph that demonstrates how well expectation matched up with reality from 2020 to 2023…

For 2024 to 2025, they do not align as nicely, which is interesting.

Here is the most shocking called strike from the first half…
Most shocking ball…
The next set of generalized logistic regressions predict the probability that the batter will swing. The possible variables list is much longer. It includes…
- p_throws (pitcher handedness),
- release_pos_x (horizontal release position),
- release_pos_z (vertical release position),
- arm_angle,
- abs_arm_angle_distance_from_45,
- release_extension,
- release_speed,
- effective_speed (perceived velocity),
- release_spin_rate,
- spin_axis (axis on which ball spins),
- pfx_x,
- is_moving_in (relative to hitter),
- pfx_z,
- is_moving_up,
- movement_angle,
- api_break_z_with_gravity,
- vx0 (“velocity of the pitch, in feet per second, in x-dimension, determined at y=50 feet”),
- vy0 (y-dimension),
- vz0 (z-dimension),
- ax (horizontal acceleration),
- ay (acceleration in y-direction),
- az (z-direction),
- plate-related stuff from before,
- x_is_called_strike_given_take (output of first model [expected called strike probability if batter does not swing]), and
- x_called_strike_given_take (expected outcome [ball or strike], if take).
Some of the others inputs are indicators, squared sums (“totals”), etc.
Here are the graphs evaluating the swing models…


Additionally, I can calculate the probability of a called strike by multiplying the probability that the batter does not swing (1-x_is_swing) by the probability that pitch will be called a strike in the event that the batter takes it (x_is_called_strike_given_take).
x_is_called_strike = (1-x_is_swing)*x_is_called_strike_given_take
The corresponding graphs…


It is worth noting that I’m not concerned that the blue line is beneath the black one when the expected probabilities are near their max. In that region, the outcome is as not occurring as frequently as one would anticipate. I think the gap is due to the regressions ignoring the count (so that pitches that behave the same have identical estimates). With that being said, the plus stats, which are scaled so that 100 is league average, control for balls-strikes; home/away; and home team (park).
Here are several more videos…
Least likely to be a called strike was a bounced curveball from Triston McKenzie to Nolan Schanuel.
The job of the third collection of generalized logistic regressions is to predict the probability of a swing and miss given that the batter swung.
I tested the same variables as I did for the swing models, plus x_is_swing and x_swing (two outputs from the previous models).
Graphs


Now that I have estimates for the probability of a swing and miss given that the batters swings, I can calculate an expected swing and miss probability.
x_is_swing_and_miss = x_is_swing*x_is_swing_and_miss_given_swing


(not even one percent of pitches this season have an x_is_swing_and_miss of at least 0.4)
Most likely swing and miss…
The least likely pitch to end in a swing and miss is the pitch that was least likely to induce a swing (even though though there was a greater than a 99.9 percent chance that the batter would swing and miss if he swung).
The final generalized logistic regressions predict the chance of a barrel given that the batter made contact.
Graphs


x_is_barrel = x_is_swing*(1-x_is_swing_and_miss_given_swing)*x_is_barrel_given_contact = x_is_contact*x_is_barrel_given_contact


It is not surprising that the most likely barrel did not result in a barrel: they are rare! The batter needs to swing, make contact, and the ball needs to have left the bat in a certain way (fast enough and at an angle in a specified range).
Least likely barrel is the least likely swing and least likely swing and miss.
Now that I am done outlining the models, I can share leaders from the first half.
Pitchers with the highest xSwing percent+ (minimum 1,000 pitches)
- Spencer Schwellenbach (115)
- Tarik Skubal (115)
- Jacob deGrom (114)
- Drew Rasmussen (110)
- Garrett Crochet (109)
- Casey Mize (109)
- Ryan Pepiot (109)
- Bryan Woo (108)
- Zack Wheeler (108)
- Ryne Nelson (108)
Lowest
- Seth Lugo (86)
- Logan Allen (91)
- Carlos Rodón (91)
- Brady Singer (91)
- Jose Quintana (91)
- Will Warren (92)
- Nick Lodolo (92)
- Chad Patrick (92)
- Sean Burke (92)
- Chris Bassitt (93)
Expected swing percent plus is more consistent from one season to the next than swing percent plus is, and it can help better predict future swing percent plus.

Highest xWhiff percent+
- Jacob deGrom (130)
- Dylan Cease (128)
- Freddy Peralta (120)
- Robbie Ray (118)
- Hayden Birdsong (117)
- MacKenzie Gore (115)
- Gavin Williams (114)
- Mitch Spence (113)
- Tylor Megill (112)
- Grant Holmes (112)
Lowest
- José Berríos (76)
- Brad Lord (79)
- Chris Bassitt (79)
- Andre Pallante (79)
- Emerson Hancock (79)
- Kyle Hendricks (80)
- Jake Irvin (81)
- Ranger Suárez (81)
- Davis Martin (82)
- Zack Littell (83)

Highest xSwing_and_miss percent+
- Jacob deGrom (148)
- Dylan Cease (135)
- MacKenzie Gore (121)
- Zack Wheeler (121)
- Mitch Spence (119)
- Spencer Schwellenbach (119)
- Robbie Ray (118)
- Tarik Skubal (117)
- Freddy Peralta (116)
- Ryan Pepiot (116)
Lowest
- Chris Bassitt (73)
- Emerson Hancock (73)
- José Berríos (74)
- Andre Pallante (74)
- Seth Lugo (75)
- Brad Lord (78)
- Davis Martin (78)
- Miles Mikolas (78)
- Dustin May (79)
- Jake Irvin (79)

Highest xCalled_strike percent+
- Miles Mikolas (137)
- Jake Irvin (127)
- Andrew Heaney (121)
- Seth Lugo (121)
- Jameson Taillon (117)
- Chris Bassitt (117)
- Justin Verlander (117)
- Matthew Boyd (116)
- Nick Lodolo (116)
- Dean Kremer (116)
Lowest
- Dylan Cease (78)
- Freddy Peralta (79)
- MacKenzie Gore (84)
- Griffin Canning (86)
- Jacob deGrom (86)
- Hayden Birdsong (88)
- Matthew Liberatore (88)
- Mitch Spence (88)
- Hunter Brown (89)
- Zack Wheeler (89)

Highest xBarrels_per_x_contact_swing+
- Tyler Anderson (137)
- Bailey Ober (132)
- Miles Mikolas (130)
- Zack Littell (130)
- Lucas Giolito (129)
- Brady Singer (125)
- Seth Lugo (124)
- Jameson Taillon (123)
- Simeon Woods-Richardson (121)
- Jack Flaherty (121)
Lowest
- Max Fried (66)
- José Soriano (69)
- Framber Valdez (72)
- Garrett Crochet (74)
- Logan Webb (76)
- Cristopher Sánchez (79)
- Paul Skenes (81)
- Tarik Skubal (81)
- Zack Wheeler (82)
- Landen Roupp (83)

Highest xBarrel percent+
- Zack Littell (148)
- Bailey Ober (139)
- Tyler Anderson (132)
- Nick Martinez (130)
- Miles Mikolas (129)
- Lucas Giolito (128)
- Jameson Taillon (126)
- Mitch Keller (126)
- Matthew Boyd (124)
- Kyle Hendricks (124)
Lowest
- José Soriano (69)
- Max Fried (72)
- Framber Valdez (74)
- Logan Webb (75)
- Landen Roupp (79)
- Carlos Rodón (79)
- Garrett Crochet (80)
- Gavin Williams (81)
- Dylan Cease (82)
- Paul Skenes (82)

The expected barrel rates are more predictive of future barrels rates than the barrels rates are, which is exciting.
Once again, expected pitching plus is determined by expected swing percent plus, expected swing and miss percent plus, expected called strike percent plus, and expected barrels per expected contact swing plus
Top 15 pitchers this season by xPitching+

If you want to see the full leaderboard, check out Pitcher, a shiny application I made.
Here is a graphing relating xPitching+ to Pitching+ for pitchers this year…

Expected pitching plus explains nearly 75 percent of the variation in Pitching+ when dealing with seasons from 2020 to 2025 where the player threw at least 100 innings.
Expected pitching plus is more consistent over time, and the correlation between Pitching+ year-to-year is increased when xPitching+ is considered.

I created these expected pitch metrics in hopes that I would be able to evaluate pitchers and their pitch types at a higher level: the numbers provide me with an idea of what the outcome of a pitch (or pitch type from a pitcher) will be based on its characteristics and location.