Forum
Expected score is too dependant on opponent's rating
|
joby.d wrote
at 6:33 AM, Wednesday December 20, 2006 EST
I assume a true 1600 player who plays 30 games against true rating 2000 players would finish with a significantly improved rating. I assume a true 2000 player who plays 30 games against true rating 1600 players would finish with a quite damaged rating. In my opinion this means there is too much damage/rewards if your opponents ratings are significantly different than yours. The obvious catch to reducing damage from losing to low rated opponents is people can inflate their rating by beating on slightly weaker opponents. However, I still recommend at least trying an expected score forumla of:
EP1 = 1/[1+4^[(Rn-R1)/400] --- Reasoning: iirc my current expected score = 1/[1+10^[(R2-R1)/400]] + 1/[1+10^[(R3-R1)/400)] + ... i believe the current actual scores are: 1st - 6, 2nd - 5, ... Case A: 2000 vs 6X1600 Let's assume this rating can get 1st - 30%, 2nd - 30%, everything else - 8% each. imo that is a generous win/loss record, correct me if I'm wrong. (Rx-R1)/400 = -1 EP1 = .91 X 6 = 5.4545 AP1 = 6(.3) + 5(.3) + .08(4 + 3 + 2 + 1 + 0) = 4.1 EP1 >>> AP1 :( Suggested EP1 = 4.5 which is closer to the AP1 4.1 assumption. Case B: 1600 vs 6X2000 Let's assume this rating can get 1st - 5%, 2nd - 5%, 3rd - 10%, 4th - 10%, 5th - 20%, 6th - 25%, 7th - 25%. I'm hoping this is less than the truth so again correct me if you think I'm wrong. EP1 = 0.91 X 6 = 0.5454 AP1 = 11(.05) + 7(.1) + 2(.2) + 1(.25) = 1.9 EP1 <<< AP1 :( Suggested EP1 = 1.5 which is closer to the AP1 1.9 assumption. Other Cases: 2000 vs 6X1950 Let's assume a lucky P1 shall win: 1st - 20%, 2nd - 20%, 3rd to 7th - 12% each Current EP1 = 3.43 Suggested EP1 = 3.26 assumed AP1 = 3.4 So AP1 of a player who has 50 more rating points than his opponents may be greater than 3.26 so there is a risk of people inflating their ratings by playing against slightly lower opponents. However: 2000 vs 6X1200 Current EP1 = 5.94 If you get 1st in 95% of these games and 2nd in the other 5% you would average a gain of 0.32 points per game. If you get 1st in 90% of these games and 2nd in the other 10% you would still lose 1.28 points per game. I think this is pretty crazy!! Suggested EP1 = 5.65 I'm not going to look at 1200 vs 6X2000 et cetera because I think they occur too rarely to be significant. * Note: Unless I'm mistaken, only the difference in ratings matters. Therefore the only balancing effect from adding 1500 to everyone was now there are 1500- ratings? |
|
Rven wrote
at 7:13 AM, Wednesday December 20, 2006 EST ...
|
|
|
BlackHawk wrote
at 9:09 AM, Wednesday December 20, 2006 EST I second that Rven
|
|
Lindsay wrote
at 10:53 AM, Wednesday December 20, 2006 EST That's why Ryan set up boards for different score levels.
|
|
joby.d wrote
at 11:28 AM, Wednesday December 20, 2006 EST I mean to say in chess if someone with a high true ELO rating plays someone with a low true ELO rating their win/losses should balance out such that their ratings don't change. Here the low person would skyrocket while the high person would plummet so I think that could be improved.
|
|
Ryan wrote
at 12:12 PM, Wednesday December 20, 2006 EST I think joby.d is right.
Althought I'm not sure which number should actually change. Joby's suggestions means that opponents rating matters less when adjusting your rating... which makes sense because of the luck factor. Alternatively the k value can be adjusted to a lower value. This change would mean that rating matters the same amount but it takes longer to change your rating. For example, if your expected score is 4/7 but you come first your ranting change is currently (7-4)*k, and k is 32. This change is +96. In chess you don't play as many people so your score is never higher than 1*32. By changing k we can make the change per game closer to chess. If k is 8 with the above game the change is +24. I might prefer this solution since it still means that ratings matter. It just makes it hard to get a large rating difference. What do you think. |
|
Pegasus wrote
at 1:38 PM, Wednesday December 20, 2006 EST I would support joby.d's suggestion of (if I understood it) changing the 10 to a 4.
I think this is better than changing the k value, which would just imply slower convergence to true ratings. Changing the 10 to a 4 suggests that ratings have lower predictive power, which is in fact true. Change k, and the 2000 playing 6x1600 will still be losing points on average, just more slowly. Change the 10 to a 4, and the implication is that the 2000 isn't so much better than the 1600s as the orginial formula assumed. |
|
Ryan wrote
at 1:53 PM, Wednesday December 20, 2006 EST I can agree that with the current k the predictive power of the ratings should be lower.
The reason I'm a little resistant to lowing the power of the ratings is because I think that although there is a lot of randomness there can still be a lot of skill and a difference between good and bad players... or new and old. I want to believe the ratings can represent a more accurate measurement of skill. So before giving up, or softening this idea Id rather try to make the ratings more accuratly representing skill. I think reducing k is better at achieving this. If k is 1/4 what it is now then a 2000 player now is actually a 1625 player. Now this same player against six 1600 players is more accurate. If, with this reduced k, a player is able to make it up to 2000 then perhaps he is substantially better than a 1600 player and the elo estimate would be more accurate. I'd also like to add a win/lose rating from the elo estimate to along side the current one. for example if you win you get an extra +24 and if you don't get get an extra -4, regardless of position. I think this may be a better first step before reducing the predictive power of the rating. |
|
Ryan wrote
at 1:57 PM, Wednesday December 20, 2006 EST Also, having a high k, currently 32, puts more importance on single games where a lower k favors more games.
The formula says the 800 points is the difference it takes to be certain about a games outcome.... with a k of 32 this is about 8 games.... with a k of 8 it is about 32 games. |
|
Pegasus wrote
at 4:41 PM, Wednesday December 20, 2006 EST Ryan, you said
"If k is 1/4 what it is now then a 2000 player now is actually a 1625 player." and I think this is wrong. Changing k doesn't change the results that the formula predicts, the EPn numbers. Changing k will only reduce the weight given to a single match in how far to move them. What joby.d (I think) and I are saying is that it is not only the scale - the amount of movement - that is wrong, but sometimes the direction. k won't change that, but a formula with a 4 instead of a 10 will - because it would be a formula that recognises the greater degree of luck that actually exists in the game. It would give greater credit to good places by strong players against weaker opposition - and I think we all recognise that that greater credit is justified by the general reluctance to play on weaker tables than we qualify for. |
|
algios wrote
at 4:51 PM, Wednesday December 20, 2006 EST lowering power of win/loss is important, becaus it is annoying to lose 200 points, in two games by bad luck. An improvement of points is always motivating no matter how many points, on a loss it depends also on height.
|