Wednesday, July 17, 2024

Introducing Stuff+, Location+, and Pitching+

One of the most fascinating aspects of baseball to me is evaluating the intrinsic properties of a pitch, things that can’t necessarily be quantified by the eye test or intuition, such as the exact value of a certain pitch location or the value of vertical break over horizontal break when comparing pitch shapes. Seeking to answer these questions, with inspiration from models created by Max Bay and Cameron Grove, I drew up three pitch models of my own: Stuff+, Location+, and Pitching+.


I used an XGBoost approach to build each model, due to its flexibility when working with large datasets, as well as its renowned predictive accuracy. I trained these models to estimate probabilities of different events in order to obtain an expected run value, which was then rescaled to be put on a plus scale. I split all pitches into six groups: four-seamers, sinkers, cutters, changeups (traditional changeups and splitters), curveballs (traditional curveballs and knuckle-curves), and sliders. I used the {tidymodels} package(s) infrastructure in R throughout the model building process. To tune model hyperparameters, I used two-fold cross-validation with a grid search to reduce the time and computational resources exhausted. For the Stuff+ models, which only consider events that occur after a swing, three models were trained per pitch type group: a whiff probability model, a foul probability model, and an in-play model, which gives the probability of different launch angle and exit velocity bins. For the Location+ and Pitching+ models, which consider all events, two more models were trained per pitch type in addition to the three mentioned above: a swing probability model and a take model, which produced probabilities of a ball, called strike and hit by pitch. The inputs for the Stuff+ models were all pitch flight metrics available through Statcast, along with release point data and spin efficiency, which was estimated using code written by Max Bay. Velocity, movement, and spin axis difference off the primary fastball were also used for changeups, curveballs, and sliders. The inputs for the Location+ models were the count, pitcher hand, and batter hand, and plate coordinates. The inputs for the Pitching+ models were both the Stuff+ and Location+ variables combined.

*Note: 100 is average for each pitch type even though there are varying run values for different pitch types. For example, a -1.5 xRV/100 four-seam fastball would have a higher Pitching+ than a -1.5 xRV slider due to sliders having lower run values than four-seamers, on average.

Although not necessarily intended to be predictive stats, Stuff+, Location+, and Pitching+ are relatively stable year-over-year, most notably Stuff+.

Now, onto the leaders!

(Among pitchers with at least 50 pitches thrown)

2022 Stuff+ Leaders:

2022 Location+ Leaders:

2022 Pitching+ Leaders:

2022 Pitch Type Stuff+ Leaders:

2022 Pitch Type Location+ Leaders:

2022 Pitch Type Pitching+ Leaders:

These models are also helpful for seeing what contributes most to the effectiveness of various pitch types. For instance, with four-seam fastballs, the most important aspects in regard to increasing whiffs are vertical movement, velocity, and vertical release point.

When you look at the interaction plot between four-seam velocity and vertical movement, there is a clear pattern between good fastballs and bad fastballs.

Variable importance can also vary by model type. Although vertical movement, velocity, and vertical release point are the most important variables for the contact model, spin efficiency and horizontal movement play more critical roles in determining what happens when the ball is put in play.

For sinkers, staying away from the middle ground of vertical movement is key, while more horizontal run is also helpful.

It is crucial to get to the zero line or past (positive glove-side movement) while throwing as hard as possible to make a good cutter.

The sweeper revolution makes sense when you look at slider Stuff+; the darkest red regions on the plot are where pitchers are able to kill drop on their sliders while maximizing horizontal movement. Using the parameters from Dan Aucoin, sweepers have an average Stuff+ of 124, compared to 89.5 for non-sweeper sliders. It is apparent why more and more pitchers have begun to add a sweeper to their arsenal.

For changeups, it has been hypothesized that vertical movement separation off the fastball is more important to stuff quality than velocity separation and the data backs that up with a far stronger relationship between the former and Stuff+.

Curveballs have two primary components when determining success: velocity and vertical movement, with velocity being essential. This tweet from John Creel prompted me to check my model results, and sure enough, all curveballs over 85 MPH have above-average stuff.

If you would like to see how location value varies by pitch type, plate coordinates, and count, check out this thread from my Twitter…

A comparison of my model versus Eno Sarris’ model is fascinating too…

Thanks for reading! Feel free to reach out to me on Twitter @Drew_Haugen if you have any questions or want to see the model results for a player not mentioned in this article.

Data through 8/9/22