Using Data Analytics to “Moneyball” Overwatch Roster Building

Esports player scouting driven by data analytics

10 min readMar 19, 2021

In 2016, I got on a plane to Los Angeles, ready for my first job in esports. A few days prior, I had pitched the then CEO of Immortals, Noah Whinston, on a statistical model that I had been building to analyze Overwatch player performance. After hundreds of hours poured into model building, I was a nervous wreck for our call. I worried that Noah wouldn’t be interested in data analytics, but one sweaty phone call was all it took to assuage my fears. Noah loved the idea, and a few days later I was in Los Angeles working side-by-side with him to build Immortal’s first Overwatch roster.

Noah, I think, was inspired by genius roster-builders like Billy Beane, who was depicted in the novel-turned-film Moneyball. Moneyball is a baseball biographical story focusing on Beane, who, at the time of the narration, was the general manager of the Oakland A’s. In it, Beane circumvents the A’s small budget to find competitive success through sophisticated, analytics-focused player scouting. My sense from working alongside Noah was that he believed that data and analytics could drive better roster building decisions in the long-run. I too believe in the ability of sophisticated data analytics to drive better roster decisions in esports.

Background of the model

I began building my player analysis model while I was working in investment management. Although I found investment management boring, it trained my skills in Microsoft Excel. I began thinking through ways that I could use my Excel modeling abilities to break into esports because it was my dream to work in the industry.

Overwatch had recently fully launched, and I had been hooked on the game since the public beta in early 2016. I saw an opportunity to be one of the first people creating data-focused Overwatch analysis. The game was brand new, so established statistics products didn’t exist. Furthermore, I was confident that Activision Blizzard would foster an esports ecosystem around Overwatch.

Before jumping into the Overwatch model, I first dug around traditional sports analytics frameworks. Growing up in a baseball-crazed household, I was familiar with baseball analytics, also known as sabermetrics. I dove deep into studying sabermetrics to understand the most useful and adaptable metric that I could find.

I landed on Value Over Replacement Player (VORP). VORP tries to capture how much a baseball player contributes to their team’s win/loss over a season compared to a player that could be quickly and easily slotted in for that position. For example, a player with a VORP of +3 should perform 3 runs better for their team over the season than what any easily replaced player is able to generate.

The formulas for VORP are baseball-focused but can be adapted to Overwatch. Consider the different statistics that pitching VORP accounts for: Runs and innings pitched standardized over a game (9 innings), and the expectation for a replacement level player. That’s it.

In other words, if I could determine what a “replacement level” Overwatch player looked like, statistically, then I could determine the impact on the game’s win.

The hard part was determining what impacts a game’s win. In baseball, if a pitcher gives up more runs, that is objectively leading to a game’s loss. In Overwatch, KDA doesn’t translate to victory. It’s one aspect of the game, but there is no stat that strictly determines winning or losing. For example, even statistics like “payload distance pushed”, a direct measure of a team’s impact on a payload map’s victory, says more about the team than it does about the individual player.

Math behind the model

Rather than try to guess at which stats determined win rates, I decided to use data. I asked a friend (thanks Zack) to run a scraper to gather all available data on all of the top Overwatch players. I only took the master-and-up equivalent solo-queue players to ensure that the model would account for only professional & closely-skilled players. I also necessitated that they had at least 5 games on the hero to ensure that one-off outliers wouldn’t skew the data. For each of the heroes at the time, we ended up with around 200 players-worth of data.

I used solo queue data because of a lack of professional data. Any pro will tell you that the solo queue experience is not representative of professional gameplay. But, I did not have access at the time to extensive data from professional matches. Instead, I made do with solo queue data under the assumption that solo queue data would tell a story about a player’s mechanics and talent.

The data gathered included win rate on the hero, kills, deaths, assists, objective kills, damage blocked, and healing done (if you were to do this today, you could probably get MUCH better stats). I then created additional stats, such as non-objective kills (kills — objective kills = non-objective kills). Since I was intentionally not guessing which stats determined winning, I tried to create as many additional stats as I could, regardless of how mundane or irrelevant they seemed.

*There were not many stats I had access to — but there were 200+ columns of stats per hero.*

I then needed to calculate two things, the standard deviation of each stat across all players and the correlation of each player’s individual stats against their individual win rate. Neither metric I find particularly helpful on their own, but together they allow you to calculate a correlation coefficient. The correlation coefficient, R, tells you a specific stat’s impact on win rate.

Stats with higher correlation coefficients have a larger impact on a team’s winning or losing the game, when playing that hero. Here are some correlations found at the time:

You might notice that deaths are always negatively correlated. Your character dying actively detracts from winning the game. But the degree to which that is true varies. For example, Tracer’s death correlation coefficient is -.140, while Zenyatta’s is -.199. Dying once as Zenyatta has a larger impact on the team’s ability to win the game. This is probably a combination of Zenyatta’s ability to increase ally damage while also healing the team. When he dies, his impact spans past his own presence in the game.

These stats are not necessarily useful for specific instances. For example, Zenyatta dying after the entire team has already died won’t impact his ability to heal, and he probably wants to reset with his team anyway. Tracer dying after killing most of the enemy team is fine, too. Instead, these coefficients shed light on averages for these statistics. The more games a player has under their belt, the more accurately these stats should reflect their impact on the game.

In theory, the model should let you take a player’s stats and multiply them, one by one, by the correlation coefficient for each stat. Adding those together should give you a number that should account for the player’s impact on win rate. The higher the number, the more the player is contributing to their team’s wins.

It is only theory, however, as Overwatch’s game stats are not independent from one another. For example, eliminations is a stat that we counted, as was accuracy. Accuracy and eliminations are certainly correlated, where higher accuracy could lead to higher eliminations. So, rather than add all of the statistics together when multiplied by their correlation coefficients, we had to create a formula for win rate.

Baseball’s pitching VORP was simple because there were few stats it needed to track to judge a player’s impact on scored runs. The model we created for Overwatch was more complex.

We grouped related stats like kills and KDA so as not to double count them. Then, we weighted each group as a function of all of their correlation coefficients. We repeated this process until each hero had all of their individual stats covered.

*The “…” means continue with all stats until all included — can’t be too specific — trade secrets!*

Not all heroes had the same groupings. For example, Mercy cares a lot about her healing statistics, while McCree literally doesn’t have one.

How the model analyzed players

In baseball, a pitcher with a VORP of +5 is assumed to be better than one with a VORP of +1. When I created the model, I wasn’t enough of an expert in Overwatch at the time to know what made one player better than another. So, the model ranks a player’s impact on winning or losing games per hero. This rank is outputted not as a standalone score but in relation to a “replacement player”.

The model was created to rank players numerically, by hero. I curved everyone’s scores against one another, using a benchmark of 1000. If a player had 1300, you would expect them to be 30% better than the benchmark of a 1000 score player. While this didn’t tell me how many additional games they’d win, it did not matter. What was important was relative skill.

Relative skill was the only relevant benchmark because roster building is a zero-sum game. If I sign a player, it means that another team cannot sign that player. As such, we only needed to identify the best players relative to their peers. By including existing players on top professional teams in our analysis, we could estimate a player’s skill level in relation to one another. For example, Taimou was widely considered the best Widowmaker in the world at the time. Taking Taimou’s Widowmaker score, which was the highest on the model, and comparing it to MendoKusaii’s (Mendo’s was ~14% lower than Taimou’s) gives us a relative measure of skill. Apply this to players that you want to sign, and you can estimate your team’s performance.

An expectation when building the model that proved to be true was that there were (and probably still are) no players who were the best at all of the heroes. There was no DPS player who was statistically the best at all DPS characters.

Our goal was to find two different types of players. Players who were better than the competition, ideally best in class, at one given hero, and players who were close-to-best at many heroes. Players who were close to the best at multiple heroes, such as Taimou, could form strong foundational pieces for the roster. Players who excelled at a given hero, but not necessarily at their role, could be slotted in and out as the meta shifted. In theory, this strategy should produce a top team.

In practice

We proceeded with the roster building process based on the model’s output. Immortals brought together a group of players from across several rosters to participate in a one-off tournament under an anonymous banner to see how they performed. With no practice, they still won the tournament.

The roster of Zombs, Shadowburn, Forsak3n, Two Easy, FCTFCTN, and Rawkus ended up signing with FaZe instead of Immortals, though.

FCTFCTN is a great example of the model’s output. Prior to signing with FaZe, FCTFCTN was on a team called Ohno, which consistently placed 5th/6th in North America. However, FCTFCTN’s Winston data was very strong — he was ~16% better than average and the 3rd best Winston on our list of pros. His Reinhardt was among the best, too.

In other words, the model was saying that FCTFCTN was an outlier, and probably Ohno’s best player. Since Winston was meta, if given the teammates to do so, he should be able to place as a top team.

Of course, several other players were also highlighted in the model. Shadowburn’s Genji, Zombs’ Zarya, and Rawkus’ Lucio were all statistical outliers. So long as some, or most, of these heroes were meta, the team should be able to perform well.

FaZe placed in the top 4 of their first 9 tournaments together, winning 4 of them. They continued to place well through most of 2016. Toward the end of 2016 and into 2017, the meta began to shift, and FaZe went from being a top-3 North American roster to a ~5th place team.

Closing Thoughts

My time putting together Immortals-turned-FaZe’s first Overwatch roster was an incredible foray into the industry that has guided how I think about roster building since. I don’t think my model was perfect, but I think that the driving philosophy behind it — that data can drive better roster decisions — is still correct.

Today, if I were to build a roster I’d probably first build something like Oracle’s Elixir’s Early Game Win Rating, then try to break out individual impact on win rate. Now, teams have much greater access to professional match data, so building better models should be easier than ever.

More recently, I had the opportunity to use data with League of Legends player analysis while working with Clutch Gaming. Some notable names that came out of that were Vulcan, Spica, and Neo. But that’s an article for another time.

I think data analytics is still a hugely underexplored facet to roster building and player scouting in esports. This is true across all games, in my opinion. I hope to see teams implement more of it in the future.