[et:qw]: idea: ranks based on XP divided by playtime

nUllSkillZ · 2007-03-10 10:59:51 UTC

Sorry for opening another XP / stats related thread.
Also not sure if it’s mentioned already somewhere.

If I got it right there will be (persistant) ranks.

Idea:
The ranks could be based on the total earned experience points divided by the total playtime.
And should be fixed at the best result you ever have had (so you can’t decrease your rank).

No disadvantage for people that have less time to play.

extraordinary · 2007-03-10 12:11:16 UTC

You cannot have a reward system in a game such as a first person shooter without there being a huge imbalance. If the reward system doesn’t actually give you anything, besides a title, or some other esthetically pleasing word or picture, then there’s no point in having the reward system because it won’t actually matter. In which case, people could do the same thing they’ve done in Counter Strike and just put ranking on their server.

If it does give you access to things that lower ranks do not have, then it’s imbalanced. Good players get rewarded, bad players get the shaft; all that would do is turn players away from the game.

Edit: And on the note of the whole experience / ranking system, I hope you guys realize how difficult it really is to make accurate calculations of a player’s skill level in a 1 on 1 situation, let alone in a 2vs2 or higher situation.

Even if you use ELO’s Ea | Eb = 1 / 1 + 10^((Rb|Ra - Ra|Rb) / 400) and Ra | Rb = Ra|Rb + k * (Sa|Sb - Ea|Eb) system, it still only accounts for an individual’s win or loss, and not kill stealing or goal stealing, friendly fire, helping fire, damage dealt, damage received, objectives accomplished, objectives failed, vehicles destroyed, enemies thwarted, and a plethora of other things that could happen during a single map. There’s way too many things to calculate to determine what an individual player should be rewarded.

And using a simple formula of setting certain things to a fixed value and adding that value upon completion of whatever it may be has so many loop holes and flaws that it’s not even funny. That’s why they put “scores” in games. A player’s score is essentially their “rank” for that map. And your reward is knowing how well you did in relation to everyone else.

ParanoiD · 2007-03-10 12:57:54 UTC

How about saving servers and energy by just resetting the XP after a campaign as in ET? ETQW will do exact the same, unless you want to keep all data and make a mod for it…

kamikazee · 2007-03-10 13:11:56 UTC

nUllSkillZ is imo just starting a thread on the math to calculate the persistent stats.
Please keep the XP confusion for the other thread.

Flesh · 2007-03-10 13:26:45 UTC

I think ranking system is a done deal for SD so theres no point in disscusing that anymore. From what I heard it seems it will be as fair as it can get and I trust SD not to make a system that encourages stats whoring but instead estimates your real abilities.

Dazzamac · 2007-03-10 13:59:43 UTC

The only reason for the persistant rank is so people can measure the size of their eDick. No rewards will be given other than bragging rights. I agree that rating the rank over time played is a good idea beause it will give you a better impression of how good the player is and not how unemployed he is.

BondyBoy007 · 2007-03-10 14:34:14 UTC

lol so true

Zaedyn · 2007-03-11 04:41:38 UTC

Just FYI I’ve used XP/time-based metrics in the past and while they are more interesting than straight XP-based metrics, they still have problems.

Mostly these problems come from the fact that getting XP isn’t not always positively correlated with helping your team win. People do weird things in ET to get XP quickly, we’ve all seen that.

It’s one of the reasons I moved to a system that is purely based on wins. The stats system in etpub is purely a, “How much will that guy help my team win” measure. It doesn’t care about XP, K/D, or anything like that.

The xbox360 version of QW will, of course, get the Trueskill system was is very similar to the one I put in etpub. Both systems are fair in my opinion.

I know my system predicts the winning team correctly 73-74% of the time. This is based on almost 250,000 matches and nearly 150,000 players (see http://stats.etpub.org).

Given that decent of a prediction job with so simple a model, I trust my rankings within a good degree of tolerance.

extraordinary · 2007-03-11 15:02:29 UTC

Zaedyn:

Just FYI I’ve used XP/time-based metrics in the past and while they are more interesting than straight XP-based metrics, they still have problems.

Mostly these problems come from the fact that getting XP isn’t not always positively correlated with helping your team win. People do weird things in ET to get XP quickly, we’ve all seen that.

It’s one of the reasons I moved to a system that is purely based on wins. The stats system in etpub is purely a, “How much will that guy help my team win” measure. It doesn’t care about XP, K/D, or anything like that.

The xbox360 version of QW will, of course, get the Trueskill system was is very similar to the one I put in etpub. Both systems are fair in my opinion.

I know my system predicts the winning team correctly 73-74% of the time. This is based on almost 250,000 matches and nearly 150,000 players (see http://stats.etpub.org).

Given that decent of a prediction job with so simple a model, I trust my rankings within a good degree of tolerance.

Using a win/loss model has a ton of flaws. For example, a good player is on a crap team, thus his estimated skill level is lowered, not because he’s a bad player but because his team is bringing him down. Thus, his statistics are not accurate. Win/Loss models are only truly effective in 1 vs 1 situations, or where the team has a consistent roster. In pub play, teams are so dynamic that everyone’s stats will eventually average out, unless they consistently end up on a team that’s better than the other team, in which case, even if they’re a horrible player, they look good because their team wins a lot.

Your system also doesn’t take into account flukes. Some times good players or teams screw up, allowing a less skilled team to win. Does that mean the winning team is better? No. It simply means that in that particular battle, the winning team won. Just like a very skilled player could take on 15 people, kill 14 of them and the last one hits him with one bullet in the foot but he didn’t have enough health remaining to take the hit and dies. Is that 15th person better than that one? No. To accurately rate a player, there are hundreds of variables to take into account that only covers the minimum. I’m sure if you wanted to be as accurate as possible, you’d have to taken into account a couple thousand variables.

Zaedyn · 2007-03-11 15:10:11 UTC

I understand what you are saying, but actually both my system and Trueskill deal with these problems.

The basic problem you are mentioning is the same: a player may lose and there is nothing he really could have done about it.

That’s fine because the system isn’t going to bring your rating down to the dust just because of one loss.

It will see how often on average you win in difficult situations. The harder your wins, the higher your rating.

It’s interesting that you said a team’s ability is averaged out, because that’s how both of our models treat overall team ratings. It’s a sum of all the individual player ratings. This is proportional to the average.

So a good player on a crap team is safe not only because his skill will average out over a few games, but also because the system knows he is on a crap team, and will not penalize him as hard.

The system is able to identify players who win more often than would happen randomly. If the teams you are on win more often than not, that’s no longer random. Especially if the teams are pretty even.

In conclusion, no rating system is perfect, but my system and the one that Ralf Herbrich and Thore Graepel developed at Microsoft are the fairest I’ve seen. In addition, the performance of the predictions speak for themselves. If my system was super-flawed, I wouldn’t be able to predict the winners as well as I do.

extraordinary:

[quote=Zaedyn]Just FYI I’ve used XP/time-based metrics in the past and while they are more interesting than straight XP-based metrics, they still have problems.

Mostly these problems come from the fact that getting XP isn’t not always positively correlated with helping your team win. People do weird things in ET to get XP quickly, we’ve all seen that.

It’s one of the reasons I moved to a system that is purely based on wins. The stats system in etpub is purely a, “How much will that guy help my team win” measure. It doesn’t care about XP, K/D, or anything like that.

The xbox360 version of QW will, of course, get the Trueskill system was is very similar to the one I put in etpub. Both systems are fair in my opinion.

I know my system predicts the winning team correctly 73-74% of the time. This is based on almost 250,000 matches and nearly 150,000 players (see http://stats.etpub.org).

Given that decent of a prediction job with so simple a model, I trust my rankings within a good degree of tolerance.

Using a win/loss model has a ton of flaws. For example, a good player is on a crap team, thus his estimated skill level is lowered, not because he’s a bad player but because his team is bringing him down. Thus, his statistics are not accurate. Win/Loss models are only truly effective in 1 vs 1 situations, or where the team has a consistent roster. In pub play, teams are so dynamic that everyone’s stats will eventually average out, unless they consistently end up on a team that’s better than the other team, in which case, even if they’re a horrible player, they look good because their team wins a lot.

Your system also doesn’t take into account flukes. Some times good players or teams screw up, allowing a less skilled team to win. Does that mean the winning team is better? No. It simply means that in that particular battle, the winning team won. Just like a very skilled player could take on 15 people, kill 14 of them and the last one hits him with one bullet in the foot but he didn’t have enough health remaining to take the hit and dies. Is that 15th person better than that one? No. To accurately rate a player, there are hundreds of variables to take into account that only covers the minimum. I’m sure if you wanted to be as accurate as possible, you’d have to taken into account a couple thousand variables.[/quote]

extraordinary · 2007-03-11 15:23:53 UTC

I disagree. Predicting winners is easy. It’s calculating the % chance a player or team has that’s difficult. Once you can accurately predict that, then you can truly rate players and teams.

I don’t play W:ET so I couldn’t do the following example in that game, but I can go into a Tribes 2 server, take a quick look at the teams and tell you, with a very good % rate, who’s going to win. I don’t need any calculations or formulas to determine it. A good rating system will take two players, or two teams, or possibly even more and tell you what each team’s % chance of winning is (accurately) and then adjust based on those calculations after the results are found out. Using past data in combination with potential future data to determine how much a players rating would increase or decrease to show their “true” skill level. Using an “average” system, or a system that takes team averages, and then rewarding less or more based on that player’s individual rating to their team’s average is flawed in it’s self. You’re not calculating how much of an effect that player is having on his team. A good player on a crap team should still perform the way the player is predicted to perform. He should maintain a slightly less k/d ratio, should be able to boost his teams performance and should, to some degree, influence the out come of the game in more than just wins or losses. Saying, “oh a good player on a crap team, lets not penalize him” is, in my opinion, silly. He should be penalized, not on losing because he has a crap team, but because he doesn’t or didn’t perform as well as he should have.

Rating systems, especially in team games have to take into account BOTH player skill and team skill. You have to take into account the objectives of each map, how each player contributed to their team’s victory or loss, and a plethora of other things.

I do not think your system is accurately deducing a player’s skill, but rather a team’s average. That’s easy to do and doesn’t really prove or show anything except that statistics are real.

Zaedyn · 2007-03-11 16:01:52 UTC

UPDATED after splashdamage.com outtage. I hope our discussion is bringing SD down!

Before going too much further, I recommend you read both my paper and the one on Trueskill.

In addition, I can tell you I have measured the probability calibration (which is what the measure you are talking about is usually called) of my system and it is usually within 5-10% of the true probability. So, yes, not only does it predict who will win, but when it gives a team an 80% chance of winning, that team wins 75-85% of the time.

I’m sure Trueskill is roughly within that area as well.

Again, the system we use doesn’t take team averages, it makes a team’s rating the sum of the player ratings, and in my system’s case, also takes into account map, side (allies / axis), and server information.

So the system does INDEED calculate the effect a player is having on his team. A good player on a crap team DOES boost the team’s performance in exactly the way the model should predict.

As far as influencing the game beyond wins/losses, well, I don’t really care about that. All I care about is whether or not this player will help me win.

If I wanted kill measures, I’d use the kill/death system I designed for the same thing. It predicts your Bayesian adjusted K/D ratio, or K/D against an average player instead of any player.

A good player on a crap team can only do so much. He may raise the team’s probability of winning from 20% to 40%, but he isn’t going to guarantee a win no matter how good he is if the rest of his team isn’t very good. Especially in large team games where his great abilities WILL get washed out. But that’s OK, my system handles that. It will measure the amount a good player should improve the team directly.

I’ve played enough W:ET to see this, and I think anyone will agree. Sure, there are times when that great player sneaks through and wins it, and in those cases the probability goes up to reflect that. If a player does that often enough, his rating gets very high.

If you look at the rankings at http://stats.etpub.org, you’ll see I give them in terms of PRW or the probability that your team will win if it’s an average team playing against another average team, and you’re the only potentially non-average player. This gives exactly a measure of that “BOOST” you were referring to. It shows how much the top players are expected to boost their team’s winning chance.

This tells exactly the estimate of how much more having you on the team will contribute to your team winning.

Again, the probabilities are calibrated very well. Within 5% often. The reason 5% doesn’t give me a higher overall accuracy is because so many interesting matches happen near 50%, where +/- 5% easily leads to misclassification. This happens even more when brand new servers subscribe to the system and bring the numbers down.

So my system does take into account player skill, it does take into account the map, and it takes into account expected contribution PER player.

Can you show how it doesn’t? I would read the papers before trying to argue that. Then we can talk more.

extraordinary:

I disagree. Predicting winners is easy. It’s calculating the % chance a player or team has that’s difficult. Once you can accurately predict that, then you can truly rate players and teams.

I don’t play W:ET so I couldn’t do the following example in that game, but I can go into a Tribes 2 server, take a quick look at the teams and tell you, with a very good % rate, who’s going to win. I don’t need any calculations or formulas to determine it. A good rating system will take two players, or two teams, or possibly even more and tell you what each team’s % chance of winning is (accurately) and then adjust based on those calculations after the results are found out. Using past data in combination with potential future data to determine how much a players rating would increase or decrease to show their “true” skill level. Using an “average” system, or a system that takes team averages, and then rewarding less or more based on that player’s individual rating to their team’s average is flawed in it’s self. You’re not calculating how much of an effect that player is having on his team. A good player on a crap team should still perform the way the player is predicted to perform. He should maintain a slightly less k/d ratio, should be able to boost his teams performance and should, to some degree, influence the out come of the game in more than just wins or losses. Saying, “oh a good player on a crap team, lets not penalize him” is, in my opinion, silly. He should be penalized, not on losing because he has a crap team, but because he doesn’t or didn’t perform as well as he should have.

Rating systems, especially in team games have to take into account BOTH player skill and team skill. You have to take into account the objectives of each map, how each player contributed to their team’s victory or loss, and a plethora of other things.

I do not think your system is accurately deducing a player’s skill, but rather a team’s average. That’s easy to do and doesn’t really prove or show anything except that statistics are real.

ouroboro · 2007-03-11 20:02:27 UTC

Ack! A top-poster!

Stats are evil – period. The only thing that matters is who’s flag is flying at the end of the round. All other forms of player comparison degrade teamplay. Saying “Player1 is better at XYZ than Player2” encourages both players to become more selfish. Player1 feels he has a rank to maintain, so he will be reluctant to help his teammates, who might dethrone him. Player2 is being told that he is inferior, so he develops a single-minded goal of becoming #1, and his teammates become adversaries in that pursuit.

That said, I do think the etpub system handles the e-penis problem as well as can be done, and I respect it, although I still prefer no stats whatsoever.

Anyway, did I miss something? I was under the impression that ET:QW was going to be exactly like etmain, with XP persistent only during the campaign. We all know someone will make an “eternal XP-save” mod, but surely this won’t exist out of the box, correct?

extraordinary · 2007-03-11 20:51:20 UTC

ouroboro:

Ack! A top-poster!

Stats are evil – period. The only thing that matters is who’s flag is flying at the end of the round. All other forms of player comparison degrade teamplay. Saying “Player1 is better at XYZ than Player2” encourages both players to become more selfish. Player1 feels he has a rank to maintain, so he will be reluctant to help his teammates, who might dethrone him. Player2 is being told that he is inferior, so he develops a single-minded goal of becoming #1, and his teammates become adversaries in that pursuit.

That said, I do think the etpub system handles the e-penis problem as well as can be done, and I respect it, although I still prefer no stats whatsoever.

Anyway, did I miss something? I was under the impression that ET:QW was going to be exactly like etmain, with XP persistent only during the campaign. We all know someone will make an “eternal XP-save” mod, but surely this won’t exist out of the box, correct?

i’ll go more into detail when I have time but the system isn’t actually accounting for player skill, it’s attempting to account for a team’s skill. the model has errors that would prevent it from working in a competitive scene, such as a ladder. the only reason it’s working in a pub sense, and hardly at that, is because of the fact it averages.

Zaedyn · 2007-03-12 04:49:08 UTC

Sorry for hijacking this thread. I love ratings and ranking systems. They are a passion of mine. A large part of my PhD involved researching models for building team ratings from player ratings, and in term inferring the player ratings from team performances. I have recently accepted a job at a game company doing research in this very thing. In addition, I was recently contacted by a senior recruiter from Blizzard for help on their own multiplayer designs. I can’t go into further details about either of those positions at this point, however.

Yeah, ouroboro, you are addressing an almost separate issue. If you have rankings / ratings, should you let people see them? Because then people will start comparing themselves and being silly.

The nice thing about our stats systems (me, trueskill, Glicko, etc) is that they focus on winning. So instead of trying to out-do another player’s kill record, you try and out-win them, which is good for the team and everyone else.

But yeah, it still leads to comparisons. There are large groups of players whose ratings are so close that even though one is ranked above the other, they really are about the same.

Did you read the papers yet? First, again, it doesn’t average. It sums. In the model, the team skill is built from individual player skills by summing them up. I don’t see how this isn’t accounting for player skill. Can you show me your derivations? In my model:

team_skill = sum_over_players(player_skill)

From there you can either use MCMC integration or gradient descent to infer the player skills given the team skills. It’s called Bayesian inference and it’s mathematically sound. I don’t see any “errors” in that. That’s Bayesian Inference 101. Or statistical inference 101 for that matter. Since my model includes player ratings, if I know team performance outcomes, I can use “bayesian inference” to find the posterior probability distributions of the player ratings. I could even use “maximum likelihood” to find the most likely ratings—this is what most ladder systems do. Although most ladder systems are still using the same base model as I am.

If you have more detail on why this system isn’t accounting for player skill, I would love to hear it. Like I said, I love talking ratings models. In addition, my collegues at Cambridge (Herbrich and Graepel, Trueskill) and Harvard (Glickman, glicko, glicko-2) would be interested in the same discussions. Where can I read your works? Do you have a paper or two published? A blog?

If you’d like some more reading material, there were three papers published in major machine learning venues, including JMLR (top machine learning journal), NIPS, and ICML (top machine learning conferences). If you have something to add we’d love to hear it.

As for competitive scenes, every respectable ladder I’ve seen uses some form of the same model I use. In fact, the United States Chess Federation uses Mark Glickman’s work. My model and method for approximate inference are both a generalizations on Mark’s.

If you’re saying that competitions and ladders would rather rate at the team level, I will agree with that as long as the teams are always the same. Otherwise you need to account for players moving around.

Even that is debatable because, come on, all of those systems are based on two variations of the same basic model: either the Bradley-Terry model which assumes a logistic distribution on player strength differences, or the Thurstone Case V model which assumes a normal.

So please, we’d all be very interested in your arguments.

Nail · 2007-03-12 05:01:27 UTC

wouldn’t really call it a hi-jack, interesting read for sure, but not hi-jack

SCi-Fi · 2007-03-12 10:07:38 UTC

Im not one for XP and for all its worth it does keep ppl playin the game,
how long would the appeal for BF2 lasted without it?

Not as long as it did, but it does give that edge that games need to keep
ppl playing on their game and keep the interest as ppl want that weapon
and want them special little items like in 2142.

Many would say im contradicting myself by agreeing with XP as ive said
in other posts, now after much thought if it keeps the interest of the
game then sure yeah i would welcome it.

You only have to look at BF2/2142 to see how XP works and the pro’s
and con’s of it. Ok from a developers view it gets thousands of players
playin for many hrs to get certain weapons/gadgets and keeps the
interest as there are thousands who love stats, so it is hard to find a
balance between loyal ET fans and bringing in others from Battlefield
series who love stats.

2142 however had a crafty way in makin you work your way through to
get the weapons by not giving you the guns straight off, makin ppl work
harder and play longer.

In the end I just hope that they get the balance right.

Away from XP… In a video interview with paul which i loved to hear as he
said that “we wont be noobing the game as we want to make you learn”,
which I will back 100% as that is all Dice did with BF2 they noobed it
bigtime. i could go on & on abt how they killed the gameplay I just hope
that he keeps his word abt that…

ayatollah · 2007-03-12 10:16:12 UTC

Why do people keep getting mixed up between Xp and persistant rank. XP does not necessarily = persistant rank. As nullskillz is providing a way to calculate persistant rank from XP.

Zaedyn · 2007-03-12 13:06:03 UTC

Yeah, it all depends on the goal of the scores.

If the goal is to keep people playing by giving them goals to shoot for, then you can do almost anything you want I suppose. Kind of like Xbox Live’s rewards system.

I actually kind of like that because then you’re just trying to do things that you haven’t done yet. Reminds me of Star Ocean 3 or maybe Final Fantasy XII here you get little awards for things like, your 100th kill, etc.

These wouldn’t have to be competitive and they would be obviously “how long I’ve played”-based instead of skill-based.

I don’t mind that as long as they aren’t used to rank the players by skill. That would be silly. That’s a huge problem with ranking systems like, say, Gunbound’s where pretty much as long as you play for years, you’ll rank up, and then that rank is used for match-making. That’s just wrong.

Anywho.

extraordinary · 2007-03-12 23:01:50 UTC

Here’s why I believe your model isn’t calculating a player’s skill:

1.) The brief read I made of your paper showed a severe lack of variables in the equations. You made no attempt to actually deduce a player’s individual ability, regardless of the team situation or not. From what I read, you’re model is trying to infer a player’s skill based on the overall objectives completed by their team.

Things I believe you would have to take into account in order to determine a player’s skill would be things such as: How good is the player’s accuracy with each weapon verse the difficulty of use per each weapon? Where do their bullets hit when they do hit and how effective are those shots? When being shot at, is the enemy missing because they can’t aim or is the player forcing them to miss by out maneuvering them? What strategies are THEY employing to defeat their enemy? Are they really helping complete objectives or are they being selfish? What is their ping and how is it effecting their ability to play at a consistent rate? What about their frames/s? Is the player better in a solo situation or in a team situation? Do they employ tactics such as flanks, pincers and ambushes?

2.) From your model, I deduced that you calculate and assign a numeric value to a player which represents their skill level relative to everyone else’s numeric value but the basis for that value is decided by a team’s performance and not their individual one. If two players on a team in a ladder showed to every match and played every map from beginning to end, by your model, their skill levels would be identical and this shouldn’t be the case. Your model’s rating value of a player is averaged out in a pub situation because they will consistently be on teams that make them look good and teams that make them look bad. Your model seems to focus more around finding balance for teams based on players within the teams instead of the individual skill.

3.) In a competitive scene, the best formula to use is ELO’s with a dynamic constant and that’s only for rating TEAMS and not individual. The amount of variables you’d have to take into account in order to accurately deduce a player’s skill level and assign it a numeric value are astronomical and are not even close to being covered in your model.

Does that mean your model is bad? No. It serves it’s purpose but, in my opinion, it doesn’t even come close to calculating a player’s individual skill. I believe whole-heartedly that you’re looking in the wrong direction and omitting several key values.

I, certainly, do not have the credentials that you have, especially since my major is in a completely different area of expertise, and if you can show me how omitting things mentioned herein and inferred are beneficial to determining a player’s skill, then please do so.