Ranking Counter-Strike 2 teams using Bradley-Terry model

Introduction

When I was younger, I enjoyed playing Counter-Strike (CS) with my friends. I started with version 1.5, then moved on to 1.6, and later played CS:GO. Nowadays, I’m more of a spectator, though I occasionally analyze team performance data for fun.

I’m currently researching Reinforcement Learning (RL) and recently explored Reinforcement Learning from Human Feedback (RLHF). While reading about reward models used in the ‘post-training’ phase of large language models (LLMs), I discovered that one key approach is the Bradley-Terry model. This model is commonly used in sports like basketball, soccer, tennis, and even chess.

To better understand how reward models work, I delved deeper into the Bradley-Terry model and considered applying it to real-world data from e-sports, like CS2.

The Bradley-Terry model works by making pairwise comparisons between items and assigning a score that reflects the preference of one item over another ($i \succ j$). For example, it could represent the preference between Team 1 and Team 2, or, in the case of LLMs, between two generated responses.

For LLMs, the typical approach is to generate two responses based on a prompt and then ask a human to choose their preferred one. This process helps fine-tune the LLMs to produce responses that better align with user expectations, which is valuable since it’s difficult to define a function that evaluates response quality.

In contrast, sports have objective metrics, like the number of wins and losses to determine team preferences. This fits to the task of evaluating how CS teams performed against each other during the year of 2024.

The general steps involved on using Bradley-Terry model are the following: 1. gathering data about teams, 2. using the data to fit the Bradley-Terry model, 3. generate rankings.

Data

The first step is to gather data of winning and losses between teams. In the case of CS, I collected the number of wins and losses for each map and team that are in top 20 of HLTV ranking during the year of 2024.

HLTV top 20 teams for 2024

	team_name	country_name	stats	kd_diff	hltv_rating
1	Spirit	Russia	136	+952	1.1
2	Vitality	Europe	132	+777	1.1
3	Natus Vincere	Europe	159	+614	1.06
4	MOUZ	Europe	134	+326	1.05
5	G2	Europe	159	+404	1.05
6	The MongolZ	Mongolia	103	+103	1.04
7	Eternal Fire	Turkey	122	+183	1.03
8	FaZe	Europe	162	+125	1.03
9	MIBR	Brazil	88	+61	1.02
10	Liquid	Other	104	+249	1.02
11	Astralis	Denmark	106	+1	1.01
12	HEROIC	Europe	137	-117	1.01
13	Complexity	United States	102	-228	1.01
14	Virtus.pro	Russia	132	-87	1
15	FURIA	Brazil	101	-199	0.99
16	BIG	Germany	90	-299	0.99
17	paiN	Brazil	115	-345	0.98
18	Imperial	Brazil	84	-337	0.97
19	SAW	Portugal	62	-340	0.97
20	Falcons	Denmark	105	-593	0.95

Subset of Win/Loss matrix for HLTV top 20 teams

	Vitality	Spirit	G2	Natus Vincere	MIBR	Liquid	FURIA	paiN
Vitality	0	3	7	3	2	6	5	0
Spirit	4	0	4	10	1	3	4	0
G2	7	14	0	7	2	7	1	2
Natus Vincere	1	5	14	0	0	5	2	4
MIBR	0	2	0	1	0	1	2	12
Liquid	2	2	5	5	2	0	9	2
FURIA	0	1	0	2	0	3	0	2
paiN	0	0	0	0	11	0	1	0

Bradley-Terry model

Using the Win/Loss data, we can fit the parameters of Bradley-Terry model using maximum likelihood estimation (MLE).

There is a iterative formula to do that:

$p_i = \frac{\sum_j w_{ij}}{\sum_j {w_{ij} + w_{ji}/(p_i + p_j)}}$

We start with a initial guess like: $p_i = 1/N, \forall i \in \{1, 2, ..., N\}$ and apply the iterative formula.

For each iteration, we standardize the scores to satisfy $\sum_i p_i = 1$:

$p_i = \frac{p_i}{\sum_i p_i}$

After some iterations, we arrive to the final scores and we could obtain a ranking.

Ranking

This is the final scores and the ranking for the top 20 HLTV teams of 2024 using Bradley-Terry model.

Interestingly, it puts the major winner Spirit in the 1st position.

rank	team_name	score
1	Spirit	0.235284
2	Vitality	0.166654
3	Natus Vincere	0.130508
4	G2	0.0786501
5	MOUZ	0.0673365
6	Liquid	0.0532813
7	FaZe	0.0430689
8	Virtus.pro	0.0262954
9	MIBR	0.0248231
10	paiN	0.0240706
11	The MongolZ	0.0239945
12	Astralis	0.0228869
13	Eternal Fire	0.0211036
14	HEROIC	0.0178517
15	FURIA	0.0159465
16	Complexity	0.0140515
17	SAW	0.0134657
18	Falcons	0.00873056
19	Imperial	0.00680508
20	BIG	0.00519306

To calculate the probability of team i winning team j, we could use the following formula: $Pr(i \succ j) = \frac{p_i}{(p_i + p_j)}$. For example, $Pr(\text{Spirit} \succ \text{G2}) = \frac{0.235284}{(0.235284 + 0.0786501)} = 0.749469395$

Conclusion

The Bradley-Terry model is very versatile, he can be used in traditional sports but also in e-sports and even LLMs. Furthermore, he is simple to understand and could be a valuable tool to assess team performance in e-sports like CS.

References

[1] M. E. J. Newman, “Efficient Computation of Rankings from Pairwise Comparisons,” Journal of Machine Learning Research, vol. 24, no. 238, pp. 1–25, 2023.

[2] R. A. Bradley, “14 Paired comparisons: Some basic procedures and examples,” in Handbook of Statistics, vol. 4, in Nonparametric Methods, vol. 4. , Elsevier, 1984, pp. 299–326. doi: 10.1016/S0169-7161(84)04016-5.

[3] L. B. Anderson, “Chapter 17 Paired comparisons,” in Handbooks in Operations Research and Management Science, vol. 6, Elsevier, 1994, pp. 585–620. doi: 10.1016/S0927-0507(05)80098-2.

[4] H. Turner and D. Firth, “Bradley-Terry Models in R : The BradleyTerry2 Package,” J. Stat. Soft., vol. 48, no. 9, 2012, doi: 10.18637/jss.v048.i09.

[5] C. Huyen, “RLHF: Reinforcement Learning from Human Feedback,” Chip Huyen. Available: https://huyenchip.com/2023/05/02/rlhf.html

Counter-Strike Reinforcement Learning Reinforcement Learning From Human Feedback Learning to Rank

Authors

Gustavo De Mari Pereira (he/him)

Data Scientist & Machine Learning Engineer

M.S. in Computer Science from IME-USP, focused on Reinforcement Learning. Founder of 2 companies, 10+ years of experience working with large-scale databases and building end-to-end ML pipelines. Kaggle competitor and Scikit-learn contributor.

← Finding Meaning in Sparse Rewards Aug 13, 2025

Data Science for Impeachment Votes Prediction Apr 16, 2016 →

No results found