## Predicting wOBA Using Process-based Statistics

When trying to determine a batter’s overall offensive value using a single statistic, one of the most popular metrics to use is the weighted on-base average (wOBA). wOBA is calculated as a ratio of a linear combination of “outcome” statistics (unintentional walks, hit-by-pitches, singles, doubles, triples, and home runs) divided by, essentially, the number of plate appearances.

With that being said, could one predict whether a given player’s wOBA will be above a certain threshold using “process” statistics such as plate discipline and batted ball parameters? In particular, if we know a player’s, say, zone contact rate, chase rate, and average exit velocity, could we predict with any confidence whether that particular player’s wOBA will be above, say, .320?

Using Statcast data and a bit of machine learning, I have decided to train a shallow neural network to try to do just that. I will post snapshots of the Jupyter Notebook throughout the analysis to make it a little easier to follow.

Dataset

My dataset was downloaded from Statcast (creating a custom leaderboard) and included all qualified batter-seasons from 2015 until June 29th, 2021 – 2015 being the first year for which Statcast data are available. This resulted in a set of 989 player-seasons.

For every player-season, I collected the following nine statistics: wOBA, exit velocity average, barrel batted rate, zone swing percentage, zone swing and miss percentage, zone contact percentage, out-of-zone swing percentage, out-of-zone swing and miss percentage, and out-of-zone contact percentage.

I also created a column that was either “1” if wOBA was >= .320 for that particular player-season, and “0” otherwise. This would be the “true label” the neural network would try to predict. I picked the .320 threshold for wOBA, as that is roughly the league-average. In effect, the network would learn to differentiate between a below-average offensive performer, and an above-average one. Finally, I normalized all the column inputs, as, for example, the exit velocities are on a different scale than the statistics expressed in percentages.

Network Architecture

After a bit of trial and error, I settled on the following network architecture. The input layer had either six, seven, or eight units, depending on how many of the features I used in that particular scenario (this will make more sense further along in the analysis). Following the input layer, there were three fully connected layers with 8 units each, and a single-unit output layer, making the prediction. This is a binary classification problem – i.e. the network will make a prediction of either “1” if it thinks the wOBA of the batter will be greater than or equal to .320 given the input data, or “0” if it thinks the wOBA will be less than .320 – and so a single neuron in the output layer is sufficient. Below is a visual representation with six units in the input layer.

How did I arrive at 8 units in a hidden layer? Since at most I would use 8 input features, I picked that as the number of units in the first hidden layer. I wanted to keep the number of units consistent across layers for simplicity. And how did I decide on three hidden layers? I simply did a run with two hidden layers, and then one with three, and I got better results with three. Going to four started overfitting the training data, and so I settled on three hidden layers.

(For the sake of brevity I won’t go into detail of activation functions, regularization, loss functions etc. here in the body of the article. I will link to the code at the bottom, and feel free to hit me up for additional details.)

With the network architecture in place, I ran through four different scenarios, or four different combinations of input features, while keeping the network architecture constant. I’ll outline the results first, followed by a brief discussion.

Scenario #1: Plate Discipline Only

With what probability could we predict whether someone’s wOBA is over .320 using only plate discipline statistics, while knowing nothing at all about what happens when bat meets ball? This was my first scenario. In particular, the input features used in the training set were – all normalized – zone contact rate, zone swing rate, zone swing and miss rate, outside zone contact rate, outside zone swing rate, and outside zone swing and miss rate.

I had 80% of my overall dataset in the training set, and 20% in the test set. The network is trained on the training set, and the test set is used to gauge the accuracy of the network on data it hasn’t seen before. This resulted in 791 items in the training set, and 198 items in the test set. Here are the results after the network has learned its parameters following 15 passes through the training set:

Test set performance for scenario #1:

That’s about a 67% prediction accuracy on the training set, and about 69% on the test set. In other words, the probability that the network will be able to correctly predict whether a hitter’s wOBA will be above .320, using nothing more than their plate discipline statistics, is about 0.7. The fact that the training and test set accuracies are reasonably close – the test set accuracy actually being a bit higher – means that the network is not overfitting the training set either.

Scenario #2: Plate Discipline + Exit Velocity

While 70% is not a bad starting point, how much more accurate could the predictions of the network get if I added a feature with some actual batted ball information? For the second scenario, I added a seventh feature – the average normalized exit velocity. Here is the performance on the training set.

(As a side note, the training and test set splits were fixed for all the different scenarios. What this means is that the same 791 player-seasons were used in the training set every time.)

Test set performance for scenario #2:

The accuracy increased on both the training set and the test set; we’re now in the ballpark of 0.7 – 0.75 probability of the network making the correct prediction as to whether someone’s wOBA will be above .320 or not. Intuitively this makes sense: wOBA is calculated based on batted ball outcomes (and walks), and so adding a relevant batted ball parameter as a feature – such as exit velocity – should increase the accuracy of any wOBA prediction.

Scenario #3: Plate Discipline + Barrel Rate

Would using a barrel rate instead of the exit velocity lead to more accurate predictions? After all, the barrel rate combines two batted ball features – exit velocity and launch angle. Maybe the addition of the launch angle component would help improve accuracy. For scenario #3, I used seven features in the input layer again: the six plate discipline statistics, and the average normalized barrel rate. Here is the performance on the training set:

Test set performance for scenario #3:

The predictions of the network using the barrel rate as the seventh feature increased the accuracy of predictions compared to just using the plate discipline statistics alone, but they were less accurate than the predictions generated using the average exit velocity as the seventh feature. As to why average exit velocity led to better predictions than barrel rate – I’m guessing it’s because it is a more granular feature.

Let’s say Batter A hits three balls – a “barrel” at 97mph, and two “non-barrels” at 92mph. And let’s say Batter B hits three balls – a “barrel” at 97mph, and two “non-barrels” at 82mph. Their barrel rate will be the same, yet the average exit velocity will be different. Either way, the exit velocity provided the network with “more useful” information than the barrel rate did.

Scenario #4: Plate Discipline + Exit Velocity + Barrel Rate

For the final scenario, I used eight input features: the six plate discipline measures, the average normalized exit velocity, and the normalized barrel rate. Theoretically, this should lead to the most accurate prediction, as we’re adding the most detailed batted ball information to the plate discipline measures. This is the performance of the network on the training set:

Test set performance for scenario #4:

Utilizing all eight of the available features puts us in the ballpark of 80% accuracy of predictions. The fact that adding the barrel rate increased the accuracy as compared to the exit velocity alone, passes the smell test: while barrel rate contains some of the exit velocity information in it, it is sufficiently distinct from exit velocity so that it proved useful having it as a separate feature.

Summary & Discussion

As it turns one can get about 80% of the way towards predicting whether someone will be an above average offensive contributor using their plate discipline statistics, their average exit velocity, and their barrel rate in this particular setup. One of the advantages of using a neural network is that the network is able to learn the various non-linear interplays between the input features. For example: let’s say a player has a relatively high out-of-zone chase rate. How high of an outside-of-zone contact rate would he need to have, keeping everything else constant, to get his wOBA over .320? Is it realistic? Or let’s say a player is currently sitting at a wOBA of .310. If we keep his plate discipline statistics constant, how much harder would he have to hit the ball to get his wOBA over .320? There are usually multiple avenues to improve a batter’s performance. Once the network is trained, its predictions can serve as a starting point in evaluating which of the avenues to explore, and which would require an improvement that might be beyond the batter’s reach.

To further improve the performance of the network past the 80% accuracy, there are two ways that one could take. Either change the network architecture, such as the number of hidden layers, the number of units in a layer, the activation functions etc. Or, use additional features that the network could find useful. For example, one could incorporate the percentage breakdown of pull-straight-opposite field hits for a batter. A batted ball with a certain exit velocity and launch angle hit directly over second base could be a single, while a batted ball with the same characteristics hit down the line could go for extra bases. Furthermore, since the test set accuracy actually exceeds the training set accuracy in all four scenarios, simply obtaining additional data is not likely to improve the network’s performance.

Finally, I’m sure that this is a baby version of what major league teams use. If the network’s output and the actual wOBA of a player disagree, the player could be candidate for regression, warrant a deeper dive into their data, or an additional look by the scouts. It would also be interesting to see, how effective minor league plate discipline and exit velocity data would be in predicting major league wOBA using a setup similar to this one.

For those interested, code for the neural network here.

## Forehand: Offense, Backhand: Defense

By now it has been well established, that the majority of points in men’s professional tennis are shorter than 4 total shots, or 2 per player. Yet the most exciting spectacle in a tennis match is the extended rally, with players exchanging groundstrokes, battling for court position, and looking to exploit any small opening to gain the upper hand. Unreturned serves might be more common, but the long points earn the standing ovations and make the highlight reels.

With that being said, I wanted to see which groundstroke statistics correlated the most with actually winning the match in the 2021 French Open men’s singles main draw. Let’s use Novak Djokovic’s finals victory over Stefanos Tsitsipas as an example.

For the purpose of this analysis, I will group winners and forcing shots into one category : “offense.”

Looking at the table above, we see that Djokovic hit more forehand and less backhand groundstrokes overall than Tsitsipas did. Digging a bit deeper, Djokovic led the forehand offense category 47-32, but he also made more unforced errors on the forehand than Tsitsipas, 20-19. Switching over to the backhand wing, Djokovic is once again better than Tsitsipas in the offense category, 10-6, and he also made less backhand unforced errors than Tsitsipas, 10-13.

Which of these statistics have the most predictive power? In particular, if you didn’t know the outcome of the match, and could only pick a few of these comparisons to help you make an educated guess, which ones should you look at?

The groundstroke statistics are only available for 55 of the men’s singles matches contested in the French Open main draw, less than a half of the matches played. Despite the limited sample size, there are some interesting patterns worth exploring.

Forehand Raw Count

Let’s start with just looking at the number of forehands hit in a match. The forehand is understood to be the “sword” – an attacking weapon, while the backhand is more of a “shield” – a predominantly defensive tool in a player’s arsenal. Would it be enough just to see which player used the sword more?

It turns out that just knowing who hit more forehands in a match provides little information about the outcome. In only about 55% of the matches in the dataset did the eventual winner hit more forehands than their opponent; slightly better than a coin flip. Intuitively, this makes sense: one, we don’t know anything about the outcome of those forehands. And two, some players are more comfortable with their backhand than others, utilizing, for example, the run-around forehand less, thus decreasing their forehand count. Daniil Medvedev would be an example of such a player.

To get more predictive utility, we’ll need to incorporate the outcome of the rally into the analysis.

Forehand Offense and Unforced Errors

Looking at the forehand outcomes first, this is how often the winner of the match amassed more combined winners and forcing shots than the opponent.

Conversely, this is how often the winner of the match made less forehand unforced errors than the opponent.

It is not surprising that taking into account the outcome of the rally improves the predictive power of the statistic. Both the offense and the unforced errors are a significant improvement over the raw count of forehands. However, it is the offensive component that correlates more strongly with winning the match. About 75% of the time, the winner of the match amassed more winners and forcing shots on the forehand side, compared to about 64% of the time the winner made less unforced errors.

Why is it slightly easier to overcome a higher rate of unforced errors on the forehand and still win the match, as opposed to overcoming a lower offensive rate? I think that it has to do with controlling the rally. If I make unforced errors on the forehand, on at least some of those my feet are set, I am on offense, and I just go for too much and miss by a close margin. I can “get away” with those mistakes, as long as I keep generating the offense, getting ahead in the rally, and accumulating winners and forcing errors. Looking at the finals match, Djokovic made more unforced errors on the forehand than Tsitsipas; yet he led the offensive category by a significant margin, signaling that he was controlling more of the rallies with his forehand. It is this control that ultimately helped tilt the match in his favor.

Backhand Offense and Unforced Errors

Shifting gears to the backhand side, here is how often the winner of the match led in the offensive category.

This was probably the biggest surprise for me in the dataset. Knowing, which player generated more winners and forcing shots on the backhand side told you almost nothing about the outcome in this particular set of matches. The winner of the match had more backhand winners and forcing shots only in about 51% of the matches. You would actually have a better chance predicting the winner of the match using the raw forehand count – a process statistic, as opposed to a backhand outcome statistic.

Why is that the case? My guess is that backhand winners and forcing shots are relatively infrequent events, similar to aces. They simply don’t account for a large enough percentage of points to bear a significant weight on the outcome of the match. In the present dataset, the match winners averaged about 20 forehand winners and forcing shots per match, but only about 10 backhand winners and forcing shots. Looking once again at the Djokovic vs Tsitsipas statistics, Djokovic had 47 winners and forcing shots on the forehand, compared to just 10 on the backhand.

If looking at the backhand offense tells us little, how about looking at backhand unforced errors?

The backhand unforced error rate was the second best predictor of the ultimate match winner, behind only the forehand offense. This is a great illustration of the backhand’s function as a shield. Since players in general don’t finish a ton of rallies with their backhand (as seen in the backhand offense table), it is important that the stroke be dependable in a neutral rally – i.e. limiting the unforced errors. The backhand is used more as a “linking” shot, a bridge between defense and neutral, and as a transition from neutral to offense before the forehand is used to finish the job. Its effectiveness as a link is better highlighted when looking at unforced errors – when the stroke is not under pressure – as opposed to looking at finishing statistics such as winners and forcing shots.

It would be great to see if these patterns held over all of the 120+ singles matches played at this year’s French Open. Regardless, the results of the analysis pass the smell test: to guess, who might have won a particular tennis match, look at who inflicted more damage with their forehand, and who hurt themselves less with their backhand. More often than not, you’ll be on the right track.

## Ruminating on the Underhand Serve

Almost every sport has a set of unwritten rules that the competitors are expected to abide by and follow. In baseball, baserunners are discouraged from stealing bases in the late innings of a blowout. In American football, teams will often take a knee instead of running an offensive play if the game is already decided late in the fourth quarter. In soccer, a team in possession of the ball is expected to kick it out of bounds in order to allow medical treatment of an injured opponent. The application of the unwritten rules is nuanced and not universally agreed upon, even among the competitors themselves. The common thread among all these “agreements” though, is showing a level of respect for the opponent and the sport.

There are numerous examples of unwritten rules in tennis. For example, tennis players are expected to avoid trying to aim at their opponent with the ball during a rally, if there is an option to go around them. They are expected to apologize after they hit the net tape with a stroke, and the ball rolls over onto the opponent’s side. Players shake hands at the end of the match, regardless of how heated the competition might have gotten.

And then there is the underhand serve.

This is Dominic Thiem’s second serve return position in his third round match in the ATP Masters 1000 in Rome against Lorenzo Sonego:

I’m certainly not picking on Thiem. There are plenty of other players, who prefer to hit their returns from way behind the baseline: Rafael Nadal, Daniil Medvedev, and Stefanos Tsitsipas all come to mind. The reason for this is purely tactical; it gives them more time to react to the serve, potentially hit a forehand on the return, and gain the upper hand in the rally.

If you knew nothing about tennis, you might think to yourself: if my opponent wants to return from that far back, it would seem logical for me to try and entice them to return from closer to the baseline. Just like if my opponent wants to hit more forehands, I will aim at her backhand. I am trying to get my opponent to do the things they are not comfortable doing on the court.

You do have a potential weapon in your arsenal to help you accomplish just that: the underhand serve. But you’re not supposed to use it. Take a look at the reaction Nick Kyrgios got from Nadal when he used the underhand serve during their Wimbledon encounter in 2019:

Regardless of the history between Nadal and Kyrgios, the underhand serve is regarded as a sign of disrespect towards the opponent; they are “not worthy” of you hitting a “proper” serve against them. I would argue that the underhand serve could become a legitimate tactic in an era where, especially on clay, many players choose to set up for the return way back behind the baseline. What would have to happen for the underhand second serve to become less taboo? I think either of the following three developments would speed up the process.

If someone like Nick Kyrgios or Alexander Bublik decides to serve underhand, the stigma associated with that shot is reinforced. They are “young and brash,” “disrespecting the opponent and the game,” and “immature.” Both Bublik and Kyrgios are known for having tanked matches, and making eyebrow-raising statements in some of their post-match press conferences; their underhand second serves are then viewed through that same lens.

If Roger Federer or Serena Williams decide to serve underhand, the lens would change. Both Federer and Williams have amassed so much credit over their respective years of dominating the sport, that their reputations are bulletproof. Do you still remember the SABR?

Contrast the crowd’s reaction to Federer’s unconventional tactic to the reaction Kyrgios got when hitting the underhand serve. Admittedly, it is not an apples to apples comparison. In the SABR, Federer is trying to get Djokovic out of his rhythm, sure. Yet he still puts himself at a bit of a disadvantage by decreasing the time he has to react to Djokovic’s serve by returning so close to the service line. In other words, during the SABR, you react. During the underhand serve, you are completely in control of the shot. Regardless of that particular difference between the SABR and the underhand serve, just listen to the different reactions of the commentators and the crowd. What would the reaction be if Federer served underhand to Nadal at this year’s Wimbledon? You tell me.

Genuine Tactic

There is one more subtle difference between the two tactics, and that is the timing of their deployment in the videos above. Notice the score when Kyrgios serves underhand to Nadal: 2-5, 40:0. Serving at 40:0 on grass, Kyrgios is an overwhelming favorite to win the game. Even if he loses the point, at 40:15 he is still well ahead in the game. By the same token, once he wins the point, Nadal is clearly favored to win the set when serving at 5-3. In a way, that 2-5 40:0 point is largely irrelevant to the outcome of the first set.

Contrast that with the timing of Federer’s SABRs in the video above. First one at 1-1 15:30, and the other up 3:1 in the tiebreak. Both are tremendously important points. If Federer goes up to 15:40 in the third game, he has a good chance of going up a break in the match. Similarly, to go up 4:1 in the tiebreak, and serving, puts Federer well on his way to securing the first set 7:6.

Looking at their respective strategies from this angle, it really does seem like Federer’s SABR was a legitimate tactic deployed to surprise Djokovic and gain an advantage in the match. Kyrgios’ underhand serve was used in a situation that really didn’t matter.

I think that if the underhand second serve is used as a real, genuine tactic, some of the players’ and fans aversion would be muted. What is a “genuine tactic?” That is hard to pinpoint, but some things to look out for would be:

• Used early in the match to force the opponent to adjust
• Used throughout the match if the opponent doesn’t adjust; for example, using the underhand serve twice a game instead of twice a set
• Used on important points
• Used in multiple matches against a variety of opponents

Returner is Not a Victim

This last point falls more on the fans and members of the media rather than the players themselves. And that is simply to recognize that the returner can adjust their position, if they don’t like being served to underhanded. Just like if Djokovic didn’t like Federer’s SABRs, he could hit his second serve a little harder, and aim it at the body of Federer. Once Federer sees that his strategy is not having the desired effect, he’ll stop doing it. If I don’t like somebody slicing backhands, I can hit through their forehand. Tennis is a game of adjustments, and allowing that one’s return position is a variable that the opponent might want to exploit would go a long way towards freeing up some players to hit more underhand serves without fearing the crowd’s reaction.

There are plenty of examples of tactics evolving in various sports around the world. In basketball, once the sport has recognized and embraced the value of the three point shot, the game has evolved into a wide-open, pace and space sprint, as opposed to the slow slog of yesteryear when games were dominated by battles in the post. In American football, the multiple wide receiver formations are much more prevalent in today’s era of the pass than the run-heavy, multiple tight end sets of the years past. In baseball, you might see the shortstop lineup anywhere on the infield on defense these days, depending on who the batter is. Seeing a little more of the underhand serve would be a welcome sight for yours truly; a wrinkle, and a new tactical element in a game that has recently been a little lacking in variety for my taste.

## Alexander Zverev vs. Rafael Nadal: A Tale of Two Matches

En route to his recent ATP Masters 1000 title in Madrid, Alexander Zverev notched his first clay court victory over Rafael Nadal. Zverev beat Nadal twice before their Madrid encounter, but both victories came on indoor hard courts. Playing Nadal on clay is a different animal though. His confidence on clay is sky high, the higher bounces magnify Nadal’s ability to hit the ball with heavy topspin, and the slower conditions tend to make matches more physical, which Nadal relishes.

After Zverev took the match 6-4 6-4 in the quarter finals in Madrid, the two were set for a rematch a week later in the quarter finals of the ATP Masters 1000 in Rome. This time it was Nadal, who came away with the 6-3 6-4 victory. Looking at the statistics from both matches, we can catch a glimpse of the adjustments that Nadal made, and how he was able to tilt the way the match was played in his favor in Rome more so than in Madrid.

Extending the Points

Let’s first take a look at the distribution of the different rally lengths from the Madrid match, which Zverev won.

Almost 60% of the points in Madrid were shorter than 4 shots, and over 85% were shorter than 8 shots. The average rally length for the match was 5.9 shots. It was in these shorter rallies where Zverev won the match. Once the rally extended past 9 shots, it was advantage Nadal; but since those rallies counted for less than 15% of all points, it was not enough to swing the match in Nadal’s favor.

Going into the match in Rome, I would assume that one of Nadal’s goals was to work his way into the point, extend the rallies, and make the match more physical than was the case in Madrid. Let’s examine the same statistics from Rome.

Mission accomplished for Nadal. Zverev still won the 0-4 shot rally length, but the percentage of total points in that particular bucket dropped from about 60% in Madrid to only about 35% in Rome. Conversely, the contribution of the 9+ rally length increased from about 14% in Madrid to over 25% in Rome. Combine those two, and the average rally length in Rome jumped to 7.5 shots compared to the 5.9 average in Madrid. Furthermore, Nadal was able to reverse the 5-8 shot rally length to his advantage, and maintain his dominance in the extended 9+ shot rallies.

How exactly was Nadal able to extend the rallies in Rome compared to Madrid? By going into Zverev’s backhand more, and by hitting spinnier, heavier groundstrokes.

Targeting the Zverev Backhand

Once the point got started, Nadal’s strategy in Rome was much more focused on hitting through the Zverev backhand than was the case in Madrid. Let’s look first at the comparison of Nadal’s forehand targets from both matches:

The left image are Nadal’s forehands in Madrid, where he lost, and the image on the right is Nadal’s forehands in Rome, where he won. There is about a 10% increase in Nadal’s forehands being aimed cross-court into Zverev’s backhand. The difference is yet more pronounced on Nadal’s backhands:

The left image is once again Nadal’s backhands from Madrid. In that match, Nadal hit 3 out of every 4 backhand cross-court into Zverev’s forehand. In Rome, Nadal made sure to stay away from that particular pattern more often, and his backhands were split about evenly between cross-court and down the line. Also, notice the higher frequency with which Nadal’s backhands were landing deeper in the court in Rome than was the case in Madrid. Depth is one of the best ways to gain control of the point in a rally, and it was surely a contributing factor in Zverev committing 31 rally unforced errors in Rome as compared to just 10 in Madrid.

Heavier Spin on Groundstrokes

Not only did Nadal get his groundstrokes – especially backhands – deeper into the court in Rome, his strokes were also “heavier,” hit with more topspin. Below is a table comparing the average RPMs on groundstrokes from both the Madrid and the Rome matches.

While Zverev was hitting his groundstrokes flatter in Rome than he did in Madrid, Nadal really cranked up the topspin, especially on his backhand wing. Knowing that balls with more topspin tend to bounce higher than flatter strokes, and combining that with the groundstroke distribution patterns, we can try to guess Nadal’s baseline strategy in Rome: try to get the ball up high on Zverev’s backhand, making it harder for him to attack from that part of the court. Now, Alexander Zverev is listed at 6’6″, so this is no easy feat.

It would be great to see the average net clearance of Nadal’s groundstrokes in the match, but I can’t seem to find that statistic on the ATP website. The best I can do is compare the average net clearance on Nadal’s second serve returns, which were hit about 0.81m above the net in Madrid, and about 20cm higher – 1.03m above the net – in Rome. It is certainly not hard evidence, but it points in the same general direction as the higher rpms on the groundstrokes: get the ball higher on Zverev.

Getting the ball deep and under heavy spin into Zverev’s backhand forces Zverev into a decision. Option one is to try to take the ball on the rise, before it gets up above his shoulders. The timing of that shot is tricky even for the best players in our sport, and given how flat Zverev’s backhand is, the margin for error is slim. Alternatively, Zverev can back up and wait until the ball drops back down into his strike zone. That, however, puts him in a defensive position way behind the baseline – minimizing his chance to attack – most likely extending the rally – and playing right into Nadal’s hands. Pick your poison time for Zverev.

Alexander Zverev was able to get the best out of Rafael Nadal in Madrid by keeping the majority of the rallies short. Nadal countered in Rome with a strategy designed to extend the rallies, and trying to get Zverev uncomfortable in the ad side of the court. Should these two face-off at the French Open, one of the factors deciding the outcome of the match will be the length of the rallies. If Zverev can keep the match a first-strike battle, he has a chance. Otherwise, with every extra groundstroke hit, Nadal’s advantage will keep mounting.

## 3 Suggestions for Jannik Sinner Against Rafael Nadal

After losing to Rafael Nadal in the 2nd round of the ATP Rome Masters 1000, Jannik Sinner is now 0-2 in his young career against the Spaniard. Having a losing record against the greatest clay court player in the history of our sport is nothing to be ashamed of; Sinner’s other loss to Nadal came on clay as well, in the quarterfinals of the 2020 French Open. Looking at the statistics from their Rome encounter, I think that there are three areas that Sinner could improve on to give himself a better chance when he sees Nadal for the third time.

One particular area, where Nadal was better than Sinner in Rome, was on points that started with a second serve. I included both columns for visual effect, even though a player’s 2nd serve win% is equal to (1 – 2nd serve return win%) of their opponent.

Sinner’s Second Serve Placement

These are the locations where Sinner aimed his second serves against Nadal in Rome:

When Nadal is getting ready to return Sinner’s second serve, he can basically eliminate half of the service box in the deuce, and a third of the service box in the ad. In this particular match, it was a winning strategy for Sinner in the ad court, where he won two thirds of the points that started with his second serve down the “T”, but it was a losing proposition in the deuce side, where he lost two thirds of his second serve points that started out wide.

Where it gets really interesting is that Sinner’s plan here is to serve to Nadal’s backhand in both courts. Let’s take a look whether Nadal actually returned with the backhand once Sinner got his second serve in. First, the deuce court:

How many forehand returns did Nadal hit on Sinner’s second serve in the deuce? Zero. That’s the image on the left. All of Nadal’s second serve returns in the deuce were backhands, just like Sinner planned. Yet Nadal won 11 and lost 5 of those points. Not encouraging for Sinner. How about the ad side?

How many forehand returns did Nadal hit on Sinner’s second serve in the ad? All of them. That’s the image on the left again. All of Nadal’s second serve returns in the ad were forehands, yet he won 6 and lost 10 of those points. Much more encouraging for Sinner.

Let’s sum it up. Sinner was clearly targeting the Nadal backhand with his second serve in both sides of the court. He got as he wished in the deuce, and lost the majority of those points. He couldn’t find the Nadal backhand with the second serve in the ad, and won the majority of those points.

My biggest takeaway from this analysis would be for Sinner to hit more second serves out wide in the ad, but especially down the T in the deuce side next time he sees Nadal. It would prevent Nadal from zeroing in on one particular area of the service box, and Sinner has shown the ability to handle the Nadal forehand return by having a winning percentage against it in the ad side.

Sinner’s Second Serve Return

Sinner’s second serve returns often landed much too short in order to put Nadal on defense at the start of the point. Here is the breakdown of Sinner’s second serve return locations from the deuce side:

The forehand returns are the box on the left, backhand returns are on the right. Sinner did a good job with getting forehands on the majority of the deuce side returns: 9 forehands, and only 4 backhands. But notice that out of the 13 total second serve returns, only 3 were classified by the ATP as “deep.” Especially with the forehand returns, there is a cluster right in the middle of the court. Let’s take a look at the ad side:

The forehand returns are once again on the left, the backhands on the right. It’s a similar pattern to the deuce side: Sinner again does a good job of getting forehands on the second serve returns: 10 forehands and just 2 backhands. Yet out of those 12 returns, only 3 would be classified as “deep.” And once again, there are a bunch of forehand returns in the middle of the court.

Finally, this is where Sinner was hitting his second serve returns from:

Let’s sum up this part of the analysis. Sinner was hitting his second serve returns mostly from about 4 meters behind the baseline, with one lone exception when he stepped inside the court in the ad side. Those returns were mostly forehands that were landing short, towards the middle of the court.

I think that Sinner has two options to make his second serve return more effective against Nadal. First, he can step inside the court more, and keep everything else more or less the same – just like that one dot in the picture above. Sure, it would be great to get the return deeper, but by taking time away from Nadal, even a shorter return can elicit a defensive response.

The second option – if Sinner feels comfortable hitting forehand returns from further behind the baseline – would be to simply return higher above the net. Here are the return heights from Nadal and Sinner from their Rome match:

Sinner hit his second serve returns about 0.73 meters above the net. I’d like to see that number be closer to about 1 meter, if Sinner were to keep returning from way back behind the baseline. Just those additional 30 centimeters of net clearance would get the returns a bit deeper, and set up Sinner on offense at the start of the rally.

Rally Backhand Direction

Nadal crafted his biggest advantage in the match in the longest rallies. Here are the points won by both players in rallies of different lengths:

One way to keep the rallies shorter for Sinner – besides mixing up his second serve locations, and hitting his second serve returns deeper – would be to utilize his backhand down the line more. This is where Sinner aimed his backhand groundstrokes against Nadal:

That’s about three out of every four backhands going into Nadal’s forehand. It’s actually pretty common to see this kind of a distribution when two right handed players are matched up against each other, and they’re trading backhands back and forth. Against a lefty though, it’s a different story.

Let’s contrast Sinner’s backhand placement to where Alexander Zverev aimed his backhands against Nadal when he beat him in the recent ATP Madrid Masters 1000:

Zverev was much closer to a 50-50 split on his backhand than Sinner was, and having to respect Zverev’s backhand down the line undoubtedly kept Nadal more off balance in groundstroke exchanges. Sinner has a very simple, clean, two handed backhand, that he can hit close to 130km/h. The next time he plays against Nadal and finds himself in a groundstroke rally, I’d like to see him take more chances with his backhand down the line to the Nadal backhand. The majority of his backhands can still go crosscourt, but just like on the second serve, all he’s trying to do is show Nadal a different pattern of play, as opposed to being a little too predictable.

Sinner is one of the upcoming stars of tennis, challenging the “old guard” of Federer, Djokovic, and Nadal. Making a few small adjustments here and there is all that it might take for Sinner to claim his first win over Nadal the next time they lock horns.

## The MVP Batter

In one of the later chapters of The MVP Machine, the authors describe a working relationship between an unnamed position player and a writer at an “analytically inclined” baseball website. The player felt that his club’s advanced scouting data wasn’t granular enough, and asked the writer to supplement the information he was given by the club with additional detail. The writer was eventually performing scouting reports on the player himself, opposing pitchers, as well as the home plate umpires’ strike zones. In terms of evaluating his own performance, the writer summarized that the player was basically looking at three things: “Am I squaring up the ball? Am I swinging and missing? Am I swinging at strikes?”

With the first month of the season in the books, who would be some of the best performing hitters in the league according to this particular player’s criteria? Thanks to Statcast, we have the tools at our disposal to try and figure out just that. The dataset I used for this exercise was all qualified batters as of the morning of April 30th, 2021.

First, we need to decide which parameters to use to represent each of the three questions posed by the player. Two of the three are pretty easy. “Am I swinging and missing?” We can look up a player’s whiff percentage on Statcast. “Am I swinging at strikes?” That information is represented in a player’s chase percentage. “Am I squaring up the ball?” The natural candidates here would be, if we’re using just one number: the average exit velocity, hard hit percentage, and barrel percentage. I decided to go with the average exit velocity, because it takes into account every batted ball put in play by the batter. Let me explain.

The hard hit percentage – defined as the percentage of balls hit with an exit velocity of 95mph or harder – is binary. If a batter puts 6 balls in play at 96mph, and 4 balls at 94 mph, his hard hit percentage will be 60%. Similarly, if a batter puts 6 balls in play at 96 mph, and 4 balls leave his bat at 85mph, his hard hit percentage will be the same 60%, even though the first batter makes a more consistent hard contact. Barrel percentage has the same binary issue, and I couldn’t find anywhere on Statcast what that “perfect combination” of exit velocity and launch angle is.

Now that we have our three parameters – average exit velocity, chase percentage, whiff percentage – I normalized all three to vary between 0 and 1, using the formula:

X_normalized = (x_true_value – dataset_min) / (dataset_max – dataset_min)

X_true_value is the actual value of the parameter, dataset_max is the highest value of that parameter in the dataset, and dataset_min is the lowest value in the dataset. Let’s look at an example. Shohei Ohtani’s average exit velocity – as of the morning of April 30th – was 91.1 mph. The highest average exit velocity in the dataset was 98.7mph, the lowest 81.6mph. Ohtani’s normalized average exit velocity would then be:

(91.1 – 81.6) / (98.7 – 81.6) = 0.5555

Finally, I computed the weighted average with different weights for the normalized parameters. For the normalized average exit velocity, the higher the number the “better,” while for the chase and whiff raters, the higher the normalized value the “worse.” To account for this, I’m actually multiplying the respective weights by (1 – normalized_whiff_rate) and by (1 – normalized_chase_rate). Let’s call the final output “MVP value.”

Base Case

MVP = (1/3) * normalized_exit_velocity + (1/3) * (1 – normalized_chase_rate) + (1/3) * (1-normalized_whiff_rate)

In the first instance, I assigned equal weights to the exit velocity, whiff rates, and chase rates. This is a “baseline” run, if you wish. The top 10 is as follows (excuse some rounding errors):

Ronald Acuna Jr. has been spectacular at the plate so far this year, and he is head and shoulders above everybody else in this edition of the made-up metric. Otherwise, we see a lot of players, who have been tearing the cover off of the ball early in the season. Besides Acuna, this would include Mike Trout, Aaron Judge, Justin Turner, Jose Ramirez etc.

Where it gets interesting is that we have two guys with sub-70 wRC+ on the list: Tommy Pham and Myles Straw. Let’s take a look at each one in turn.

I’m not going to lie, I had to look up who Myles Straw was. He is making an appearance courtesy of a 15.3% chase rate (MLB average in 2021 is 28.4%) and a 13.4% whiff rate (MLB average 24.4%). Unfortunately, his 86mph exit velocity with a 6.1 launch angle lead to a .325xSLG, in line with his brief major league career. Unless Straw can start making harder contact, his elite contact ability and plate discipline alone won’t keep him in the big leagues.

Tommy Pham, on the other hand, is about to go on a hot streak here soon. The underlying metrics are solid across the board, he’s just been unlucky so far. The average exit velocity is at 91.6 mph, and Pham’s wOBA sits at .256 as opposed to .380 xwOBA. Similarly, Pham is slugging .203, compared to a .483 xSLG, a 280(!!) point difference.

Let’s change the weights on the normalized values a bit. Maybe we’d like to emphasize not chasing, staying within the strike zone. We’ll bump up the weight on the normalized chase rate to 50%, and decrease the weights on exit velocity and whiff rates both to 25%.

No-chase

MVP = (1/4) * normalized_exit_velocity + (1/2) * (1 – normalized_chase_rate) + (1/4) * (1-normalized_whiff_rate)

We see a lot of the same names from the table above. Jed Lowrie has been another recipient of tough luck. His average exit velocity is up over 90mph for the first time in his career, yet his slugging percentage sits at .406 compared to .546 xSLG.

Max Muncy is one of two players to make an appearance in this table after not being in the top 10 in the base case. His average exit velocity of 86.8mph is at its lowest level since 2016, and his 49.2% ground ball rate is at its highest level since 2016 as well. Yet even he might “heat up” in the near future, as his .365 SLG vs .460xSLG would suggest. Muncy’s chase rate of 11.9% is the lowest in the majors, and is a major contributor to his 24.1% walk rate and a .422 on base percentage. No need to press, Max.

Let’s do one more iteration, this time emphasizing hard contact; the weights in this iteration will be 50% on the normalized exit velocity, and 25% on the whiff rates and chase rates respectively.

Hit-it-hard-somewhere

MVP = (1/2) * normalized_exit_velocity + (1/4) * (1 – normalized_chase_rate) + (1/4) * (1-normalized_whiff_rate)

That would be 3-for-3 for Ronald Acuna Jr. as the top dog. Besides him, some of baseball’s most powerful sluggers make an appearance on the list. I’d like to touch on two of them, Pete Alonso and Giancarlo Stanton.

Pete Alonso’s average exit velocity jumped to 97.3 mph in 2021 from about 91mph in 2019 and 2020. He is doing more damage in the zone – his whiff and chase rates are in line with his career averages, but his zone swing rate jumped 10% from mid to high 60s in 2019/2020 to 75.8% so far in 2021. Alonso swings more in the zone than before, and makes more contact in the zone as well; his zone contact rate is at a career high 85%. That’s a recipe for success and Statcast seems to agree; Alonso is sporting a healthy .500 SLG, yet his xSLG is at .597.

Giancarlo Stanton is hitting the ball as hard as anyone, but there might be trouble lurking on the horizon. Stanton currently has the lowest zone contact percentage of his career, along with a 30%+ chase rate for the first time since 2016, and highest whiff percentage since 2015. Moreover, he has been destroying four-seam fastballs this year, having an xSLG of .773 against the pitch. Looking at breaking balls and off-speed stuff, his xSLG against sliders is .438, against curveballs it is .515, and against changeups .224. So far, Stanton has been seeing about 60% fastballs. It will be interesting to see if he starts seeing more breaking balls going forward and ends up closer to a 50-50 split for fastballs vs non-fastballs.

Besides learning that Ronald Acuna Jr. is an alien, looking at the three underlying “MVP” metrics combined seems like a good starting point to dig deeper into a player’s offensive profile. Adjusting the weights on the individual parameters emphasizes different skillsets of the batter, and allows us to identify candidates for regression or improvement in some of the surface stats.

## Jannik Sinner & Lorenzo Musetti: Two Paths, One Destination

As of April 26th 2021, there are two teenagers ranked inside the ATP Top 100: Jannik Sinner and Lorenzo Musetti. They both represent Italy, both are right handed, and both are fantastic movers on the court. Yet there are notable differences between the two as well. Sinner has a relatively flat two handed backhand, while Musetti has an outstanding, spinny one hander. Sinner is calm, almost stoic on the court; Musetti is not afraid to pump his fist and show his emotions. Sinner has already established himself as one of the elite players in our sport – as of this writing he is ranked #18 in the world – while Musetti is just now becoming known to the casual tennis fan.

One other notable difference is the path they took, in terms of the mix of tournaments played, before reaching the Top 100. Just to clarify my terminology for the rest of the article: an “Age X” season refers to the year in which a player turned X years old. For example, Sinner was born in 2001, and so his “Age 17” season would be 2018. Musetti was born in 2002, and his “Age 17” season would therefore be the year 2019.

Neither Musetti, nor Sinner, played much junior tennis past January 1st of their age 17 season. Lorenzo Musetti played just three more tournaments, one of which was the 2019 Australian Open junior event that he won. He ended up achieving the #1 world junior ranking following his 2019 French Open appearance. Sinner only competed in one junior event in his age 17 season: the Trofeo Bonfiglio Grade A in Milan, which is traditionally the second strongest junior clay court event, behind only the French Open.

While both players turned their attention to the men’s game before their 17th birthdays, the professional tournaments they had to access to – dictated to a large extent by their junior ranking – were different.

Let’s take a quick detour through the structure of men’s professional tennis. There are – roughly speaking – three levels of tournaments. Tournaments on the lowest rung of the ladder are called “Futures;” these offer the lowest prize money, and award the least ranking point totals for winning matches. Tournaments on the second rung are called “Challengers;” these are organized by the ATP, and while they offer higher prize money and higher point totals for winning matches, players need a higher ranking – again, generally speaking – to be entered into a Challenger draw than into a Futures event. Finally, the highest level of the ladder are the major ATP events you see on TV.

Furthermore, there are, for the purposes of this post, three main ways to enter a tournament. Let’s use a Challenger with 32 players competing in the main draw as an example. Out of the 32 players, 24 will be accepted directly into the main draw based on their ranking, 4 will receive a wild card, and 4 will advance from the qualifying draw. The wild cards and qualifying draws are designed to allow access to higher level tournaments to players, who wouldn’t have been accepted into the main draw based on their ATP ranking alone.

A major advantage of a high junior ranking is the attention from sponsors, national tennis governing bodies, and agencies. Once an agency signs a player, or a national tennis federation becomes invested in his/her success, one way in which they can help that player along is securing wild cards into professional events.

This is the breakdown of the first 10 professional tournaments Musetti and Sinner played, and how they entered them.

Before we go any further, I don’t want to claim that one way is “easier” than the other. In the end, the athlete has to perform at a certain level to beat Challenger-level players and make it to the Top 100. I merely want to illustrate how two young stars got to where they are.

As the great philosopher Drake once said, Sinner “started at the bottom, now he’s here.” All 10 of his first professional tournaments were Futures; furthermore, he started in the qualifying draw in 9 of those. Qualifying draws of Futures are where dreams of professional tennis go to die; there is no prize money, and no ATP points are being awarded for winning matches. The goal for any young aspiring professional tennis player is to play themselves out of that level as quickly as possible.

Musetti, on the other hand, played in only one Futures qualifying event in his career. More importantly, 4 of his first 10 tournaments were Challengers, and he entered those by being awarded a wild card into the main draw. Two things are important to note: first, you start earning ATP points right away in the main draws of Futures and Challengers, hopefully reducing your reliance on wild cards in the future. And second, by being exposed to the level of play and environment of Challengers right away, the player in question hopefully feels like “I belong here” and “I can compete with these guys.” Musetti earned those wild cards by being an elite junior prospect and capitalized on his opportunities.

Despite the different tournament mixes in their first 10 professional events, it took both Sinner and Musetti just over 50 tournaments to achieve the coveted double digit ATP ranking. Below are the number of tournaments they competed at – broken down by the rungs of the ladder – before cracking the Top 100.

If I had to pick a path, I would pick Musetti’s, simply based on the fact that he spent very little time playing Futures tournaments. In terms of facilities, practice courts, official hotels etc., the Challengers are way closer to the ATP tournament standards than the Futures are. At the same time, I have a ton of respect for Jannik Sinner’s journey. It is not easy to play qualifying events of Futures as a 16/17 year old. The conditions can be rough, winning multiple matches in the qualifying draw before even having a chance to compete for ATP points – one needs a certain level of physical and mental maturity to handle the minor leagues of tennis. The fact that Sinner was able to do that as a teenager is impressive.

In the end, I bet that Musetti will soon join Sinner in the Top 20, and we’ll get to enjoy watching them compete at the highest level of our sport for more than a decade. Slightly different routes, same destination.

## Monte Carlo Groundstrokes Spin Rates and Velocities

In tennis, the speed of the serve has been measured for decades. It usually flashes right after the serve either directly on the scoreboard, or on a dedicated display somewhere along the wall of the court. However, with the advent of Hawk-Eye and a more widespread use of ball-tracking technologies, we’ve been able to collect much more in-depth data on strokes other than just the serve. I was therefore happy to see that one of the pieces of data made available to the public during the recently concluded Rolex Monte-Carlo Masters was the average spin rates and velocities of the groundstrokes hit during some of the matches.

Before we get to the data itself, a short disclaimer. First, the data was not available for every match played in the main draw; I’m assuming this is because some of the outside courts are not equipped with the necessary ball-tracking technology. Second, the average spin rates don’t differentiate between strokes hit with topspin and with backspin. This is not much of an issue on the forehand side, as the vast majority of forehands on the men’s tour are hit with topspin. The use of backspin is much more prevalent on the backhand side, but for every match there was just one raw spin number for all the backhands hit by a particular player. With that being said, here are some of the interesting trends from the 33 matches that had the data available (all data courtesy of ATP Tour).

Forehands are hit with more spin and velocity across the board

Below are the comprehensive spin rate and velocity statistics from all 33 matches:

There were 33 matches in the dataset, with 66 total “observations” (two per match: one for each player). In all 66 cases, the forehand spin rate was greater than the backhand spin rate. Furthermore, in 65 cases, the average forehand velocity was greater than the average backhand velocity; the lone exception being Hubert Hurkacz in his Round of 32 match, where his average forehand velocity was 111.5 km/h, while his average backhand velocity was 111.8 km/h.

Let’s combine this with the typical placement of the groundstrokes. Below are the placement breakdowns for Stefanos Tsitsipas and Andrey Rublev from their finals match; they are fairly representative of the overall trend. Forehands are the first gallery, backhands second.

On the forehand side, both Rublev and Tsitsipas preferred to aim their forehand into their opponent’s backhand, but the split is much closer to 50-50 than it is for the backhands. Forehands are hit faster, with more spin, and placed more unpredictably than backhands. This reinforces the traditional view of the forehand as the “sword,” and the backhand as the “shield;” the men’s groundstroke game is really a battle for forehands. At the same time, I think a backhand down the line is a tremendous weapon in today’s men’s game to counter this strategy, but that is for another blog post.

Correlations between spin rates and velocities

What I wanted to look at next is the relationship between the spin rates on strokes and the average velocities. Do groundstrokes hit at faster velocities spin at higher rpms? Are flat strokes faster than “spinnier” strokes? Here is the table of the correlation coefficients:

Starting from the top, the relationship between backhand spin and backhand velocity seems to be completely random; knowing a spin rate or a velocity tells you nothing about the other variable, at least for the matches in the current data set – I wonder if this might be due to the fact that we’re lumping topspin and backspin backhands together. There is a stronger positive relationship between the forehand spin and forehand velocity, even though it is still only a low to moderate relationship. The main takeaway from the first two lines in the table above is that fast groundstrokes can come in different shapes; corollary being that spinny groundstrokes come at different velocities. One example to illustrate this takeaway on the forehand side:

The forehands of Tsitsipas and Davidovich Fokina in their round of 32 matches came in at about the same velocities, but the forehand of Tsitsipas was spinning about 500 rpm faster than Davidovich Fokina’s. Tsitsipas’ forehand will feel “heavier” to the opponent; we’ll return to this point towards the end of the article.

The second big takeaway from the correlation table is the much stronger positive correlation coefficients between the forehand and backhand spin rates, and the forehand and backhand velocities. In simple terms, when a player hits the ball hard from one side, he tends to hit it hard from the other side as well. Similarly, a player with a spinny forehand will most likely have a relatively spinny backhand as well.

This, to me, is especially interesting on the spin side. I would think that correlation would be weaker there; i.e. that players with spinny forehands might still have flat backhands. Forehands and backhands are hit with different grips; a grip on the racket plays a big part in the approach angle of the face of the racket as it makes contact with the ball; and finally the approach angle plays a large part on the spin imparted on the ball (spin and flight of the ball is physics; rocket science really. A rocket scientist I am not, this is overly simplified). What the data from Monte Carlo would seem to suggest is that players with more extreme forehand grips, for example, are more likely to have extreme backhand grips. Similarly, if a player has a grip closer to continental on the forehand (flatter), he’ll most likely have a flatter grip on the backhand as well. There’s a level of consistency in how he hits the ball from both sides.

Let’s finish up with the fun stuff: leaderboards! Who hit the fastest and spinniest forehands in Monte Carlo? Velocity comes first: below are the players with the forehand velocities in the 90th percentile and better.

And here are the players with forehand spin rates in the 90th percentile and better.

What makes Nadal’s forehand untouchable is its combination of speed, spin, and the fact that it comes from the left side. The only other player appearing in both of the above tables is the eventual Monte Carlo champion Stefanos Tsitsipas. Forehands coming in at high speeds, and high spin rates, tend to bounce way up high, and opponents are often forced to make contact either back behind the baseline, or in uncomfortable positions around shoulder height. It is extremely challenging to return those forehands back with interest. Also, Casper Ruud is already ranked #24 ATP as of this writing; his way into the Top 20 and higher will be paved by his forehand. If he could add a few km/h to the stroke, he would be in the conversation for the heaviest forehand in the game after Nadal hangs up his rackets.

On to the backhands, velocity first. These are the players with backhand velocities in the 90th percentile and higher:

Rafael Nadal in his round of 16 match was absolutely bludgeoning the ball. Also, we can see a lot of the same names from the forehand velocity leaderboard in the table above: Nadal, Ramos-Vinolas, Davidovich Fokina, and Fabio Fognini all make an appearance in both velocity leaderboards. This is a good illustriation of the 0.6 correlation coefficient between the forehand and backhand velocities. If you like fast groundstrokes, these are your guys.

Neither Tsitsipas nor Ruud are in the backhand velocity leaderboard, but both are featured in the forehand spin leaderboard, further illustrating the relationship between the spin of the groundstrokes, and a much weaker relationship between the spin and velocity of the individual strokes. Casper Ruud hits a two handed backhand, while Tsitsipas has a one hander.

Rafael Nadal was, unsurprisingly, the king of groundstroke velocity in Monte Carlo, ranking first in both the forehand and backhand velocity leaderboards. Casper Ruud was the unofficial king of topspin, placing first in both spin leaderboards. If the conditions in Paris are fast during the French Open – hot days, no rain, firm clay – watch out for Ruud making an appearance in the second week of the tournament.

## Marcus Semien v. 2019 & 2021?

As we’re nearing the end of spring training and getting ready for the start of the regular season, one question on the minds of Blue Jays fans is: which version of Marcus Semien are we going to get? Will it be the Semien of 2019, who finished 3rd in the AL MVP race, and had a wRC+ of 138? Or will Semien’s 2021 season be more in line with his pre-2019 production, i.e. wRC+ in the 90-100 range?

To answer that question, I wanted to look at some of Semien’s underlying statistics from that 2019 season, and compare them, mostly, to his 2018 statistics. Semien had 700+ plate appearances in both of those years, which gives us comparable sample sizes. I will use the 2020 season statistics a little bit to illustrate a few tendencies, but given the overall limitations and unique challenges of 2020, I won’t rely on that unfortunate year too much.

Semien went from slashing .255/.318/.388 in 2018 up to .285/.369/.522 in 2019. Starting with the OBP, the increase – besides the 30 point jump in batting average – was also driven by Semien’s career-high walk rate of 11.6%, and career-low strikeout rate of 13.7%.

Semien’s increased walk rate wasn’t a result of just taking more pitches – in 2018, he saw 4.098 pitches per plate appearance, while in 2019 it was 3.969/PA. Semien simply chased less out of the zone, and made more contact in the zone in 2019.

As a matter of fact, in 2019, Semien set a career high for zone contact rate, and career lows (up until 2019) for chase and whiff percentages. He was also one of only five qualified batters, who had sub-20% chase and whiff rates in 2019 (the others being Alex Bregman, Mookie Betts, Mike Trout, and Joey Votto).

The reason why I included 2020 in the table above is because it highlights two trends that will play a large role in determining what kind of a year Semien has at the plate. First: will he keep chasing less than 20% of the pitches out of the zone? Since 2016, his chase rate has been steadily declining; he didn’t expand the strike zone even after that horrendous start to the 2020 campaign. I would not be surprised to see that particular trend continue.

The second part of the equation, however, will be seeing whether the 2020 decline in zone contact is real or not. The MLB average zone contact percentage is 82.2%, and Semien dipped below that level only in 2016 and 2020. For what it’s worth, Semien’s chase contact percentage decreased only slightly from 61.2% in 2019 to 60.2%* in 2020. Nevertheless, it will be interesting to see if Semien’s zone contact rate can get back up to the mid-80s.

Turning our attention to the damage Semien does with pitches he makes contact with, his slugging percentage jumped from .388 in 2018 to .522 in 2019. The biggest contributor to that .522 slug were batted balls in the lower two-thirds of the zone.

That fact in and of itself was not new in 2019. In both 2018 and 2019, Semien did more damage when he connected with pitches in the lower part of the zone. The 2018 zone breakdown is on the left, 2019 on the right.

What did change from 2018 to 2019 was that it seems that Semien was more actively looking to swing at pitches down in the zone in 2019. We can use his swing decisions when ahead in the count as a proxy. Below are his swing rates at 1-0 and 2-0, with a minimum of 5 swings per given square in 2019:

And these are the swing decisions in one strike counts, when ahead, in 2019:

Where I think the preference for low strikes is illustrated best is in the 2-0 count box, where the hitter is really in the driver’s seat and can zero in at a particular location. In that count, in 2019, Semien was thinking “middle of the plate, down in the zone.”

Let’s contrast these with the same figures for the year 2018. First, the swing decisions with 0 strikes, minimum five swings per square:

And these are the swing decisions up ahead in the count at 2-1 and 3-1 for 2018:

Looking at the 1-0 and 2-1 counts in 2018, it seems that Semien was thinking “middle of the plate,” but was willing to swing at pitches higher in the zone as well. What I also find interesting is the different 2-0 swing profiles; in 2018, Semien was looking “middle-middle,” whereas in 2019 that became “middle-down.”

Whether or not Semien can keep hammering pitches low in the zone will be the second thing – the first being his chase and zone contact rates – worth monitoring in the 2021 season. He has traditionally had more success with pitches low, and in 2019, his swing decisions, at least when ahead in the count, played into his strength more than in 2018.

As of the evening of March 24th, Semien is slashing .256/.356/.538 in 39 spring training at-bats. An encouraging sign of his continued low strike hitting prowess was on display a few days ago in his first at-bat in a game against the Tigers.

Semien took pitch #1, swung and missed at #2, and took #3. With the count 1-2, he got a sinker in the lower part of the zone, middle-away – right in his wheelhouse – and didn’t miss it. 102.8 mph exit velocity + 24 degree launch angle = 413 feet home run to center field. Shades of 2019, and a good sign for Blue Jays fans.

## College Tennis Alumni at the 2021 Australian Open – Part 4: Women’s Doubles

In this final installment of my mini-series, I will look at college tennis alumni, who have competed in the women’s doubles draw at the 2021 Australian Open. As was the case with the women’s singles overview, I am a bit out of my depth here: all of my college experience has been on the men’s side. I apologize in advance for any inaccuracies and omissions.

There were thirteen players in the women’s doubles draw with college experience, compared to five in the singles main draw. Three players competed in the main draw of both singles and doubles: Aliona Bolsova, Jennifer Brady, and Astra Sharma . For every player, I will point out one or two college achievements, followed by a couple of professional highlights. The list is once again ordered alphabetically by the last name.

Aliona Bolsova, Oklahoma State University/Florida Atlantic University

Hayley Carter, University of North Carolina

• College: 7-time All-American, ACC all-time leader in women’s tennis singles victories (168), 2014 NCAA Team Finalists
• Professional: Career high #31 WTA doubles, 2020 US Open Doubles Quarterfinalist

Carter had an impressive career as a Tar Heel, winning 294(!) matches in her four years in Chapel Hill (singles and doubles combined). What is equally impressive to me are her academic achievements: she was ACC’s Scholar Athlete of the Year in back-to-back years, as well as a Patterson Medal recipient, which is the most prestigious athletic honor awarded at the University of North Carolina.

Kaitlyn Christian, University of Southern California

• College: 2012/2013 doubles “triple crown” winner: ITA All-American Doubles Champion, ITA National Indoor Doubles Champion, NCAA Doubles Champion
• Professional: Career high #38 WTA doubles

Christian’s junior season at USC in 2012/13 was remarkable. With her teammate Sabrina Santamaria, they not only won the 2013 NCAA doubles championship, becoming the first pair in USC women’s tennis history to do so. They also won the two biggest individual events contested during the fall – the ITA All-American championships, and the ITA National Indoor championships. Not surprisingly, the pair finished the year ranked #1 in the ITA doubles rankings.

Alexa Guarachi, University of Alabama

• College: 2013 NCAA Singles Semifinalist, 2013 NCAA Doubles Semifinalist, All-time leader in career singles wins at University of Alabama (109)
• Professional: Career high #24 WTA doubles, 2020 French Open Doubles Finalist

Desirae Krawczyk, Arizona State University

• College: 2016 Singles All-American
• Professional: Career high #22 WTA doubles, 2020 French Open Doubles Finalist

Krawczyk was Guarachi’s partner in their run to the 2020 French Open doubles final, where they lost to the #2 overall seeds Timea Babos and Kristina Mladenovic. Krawczyk and Guarachi were the #9 seeds at the 2021 Australian Open, and lost in the third round to the pair of Coco Gauff and Katie McNally.

Giuliana Olmos, University of Southern California

• College: 2016 Singles All-American, Career high #11 ITA Singles, Career high #4 ITA Doubles
• Professional: Career high #53 WTA doubles, 2021 Australian Open Doubles Quarterfinalist

Olmos is the second USC Trojan on this list. During her sophomore season at USC, Olmos actually paired with Kaitlyn Christian to claim the Pac-12 Doubles Championship crown. They didn’t compete together at the 2021 Australian Open though; Olmos’ partner was the Canadian Sharon Fichman, and the two lost in the quarterfinals to the eventual finalists Barbora Krejcikova and Katerina Siniakova of the Czech Republic.

Ellen Perez, University of Georgia

• College: 2-time Singles All-American, 3-time Doubles All-American
• Professional: Career high #40 WTA doubles, 2019 US Open Doubles 3rd round

Perez played three years at the University of Georgia, earning the doubles All-American distinction all three years. She was also a singles All-American following her sophomore and junior campaigns. Following her sophomore year, Perez earned a wild card into the 2016 US Open by defeating Ashleigh Barty in the final of a wild card tournament organized by Tennis Australia. Barty was by then ranked around #300 WTA, but is the #1 ranked singles player in the world as of this writing.

Sabrina Santamaria, University of Southern California

• College: 5-time All-American, 2013 NCAA Doubles Champion, 2013 Pac-12 Player of the Year
• Professional: Career high #53 WTA doubles

Santamaria is the third and final USC Trojan on the list. During her career in Los Angeles, Santamaria won the 2013 NCAA doubles championship with her teammate Kaitlyn Christian. In the spring of her junior season, Santamaria suffered a serious ACL injury; yet despite that setback, she still won 96 singles matches for the Trojans in less than four years of competition.

Astra Sharma, Vanderbilt

Ena Shibahara, UCLA

• College: 2-time Singles All-American, Career high #1 ITA Singles
• Professional: Career high #15 WTA doubles, 2020 French Open & 2021 Australian Open Doubles Quarterfinalist

Shibahara spent two years at UCLA before turning pro, winning 67 singles matches for the Bruins in the process. In both her seasons at UCLA, she was named the Pac-12 Singles Player of the Year – only the second women’s tennis player in UCLA’s history to be a repeat winner of the award.

Luisa Stefani, Pepperdine

• College: 3-time Singles All-American, 2016 NCAA Singles Semifinalist
• Professional: Career high #30 WTA doubles, 2020 US Open Doubles Quarterfinalist

Stefani attended Pepperdine for three years and ranks #1 on the all-time Waves’ career winning percentage at .847. During her amateur career, Stefani was ranked as high as #2 on the ITA singles rankings, and #8 on the ITA doubles rankings. At the 2020 US Open, Stefani became the first Brazilian player to reach the quarterfinals in a women’s doubles Grand Slam event; her partner in New York was ex-UNC standout Hayley Carter, who is also on this list.

Belinda Woolcock, University of Florida

• College: 2-time Singles All-American, 2017 NCAA Singles finalist, 2017 NCAA Team National Champions
• Professional: Career high #207 WTA doubles

Woolcock’s collegiate career had just about a storybook ending. During her senior year, she played line #1 singles on a team that won the NCAA team national championship. Woolcock was also named the Most Outstanding Player of the tournament, and a few days later played for the NCAA singles title. Unfortunately, in her last match as a Gator, she lost to Michigan’s Brienne Minor. Woolcock and her partner Olivia Gadecki were granted a wild card into the 2021 Australian Open, and they won a round before losing their second match to the team of Leylah Fernandez and Heather Watson.

Is college tennis a pathway to the pro game? No doubt about it. We’ve seen college tennis alumni in both men’s and women’s main draw at the Australian Open, and an even stronger representation in the doubles draws. At the same time, the players, who have made a successful transition to the highest level of the pro game, have been some of the best players in college tennis during their time on campus. With the exception of maybe one or two players, all have been All-Americans, and most of them played on teams with legitimate national championship aspirations. That would be my takeaway from this exercise: if you are an aspiring junior player, and choose to go to college before turning pro, go to a nationally competitive program, and try to distinguish yourself by setting a goal of becoming an All-American. If you can achieve that, then not only will you leave a lasting legacy at your alma mater; you will also be well prepared to face the best players in our sport on the professional tour.