Tuesday, August 2, 2011

Distortion Of Facts

The Elo system is one of the most accurate systems for ranking players across any kind of sport in the world. However, what it is not good for, just like ANY other system, is in its usage only at one particular point in time. Looking at someone's rating today, and then forget the past or close our eyes to the future will never show us the true picture of one's playing strength. To accurately gauge someone's playing strength, one has to observe his rating changes through a considerable amount of games played.

This is just like trying to gauge a player's strength based on ONE tournament alone. Winning the National Junior does not make you the strongest junior. So, being number two in National Junior does not make you the 2nd strongest junior in Malaysia. This is precisely because it is only one tournament.

This is like saying Greece winning the Euro 2004 shows that Greece is the best team in Europe.

Imagine looking at the performance of a company. Let's say the company had declining profits for 2 consecutive years. Would you say that the company is lousy without doing proper research? What if the company's profits were declining for 2 consecutive years because it was investing more in R&D. This would then increase its expenses which would then lower its profits. The R&D could pay off handsomely in 5 years time. Without looking at the details but only at results shows that you are a coffee shop speculator rather than a real investor.

This is the same in chess. Just looking at results to judge playing strength is for "coffee shop speculators". When you don't know how to play chess, this is the best you can do. Any strong chess observer knows that while IM Mok may not be at his peak in the last two years, he still has one of the deepest understandings on the board in Malaysia. Jimmy may often criticize Mok for his opening choice, but I think he will agree with what I said. Mok's playing strength comes from his positional understanding.

But anyhow, let us just verify some of the "facts" that were presented:
In the SEA games selection we saw an IM fall to almost all if not all the juniors. So FIDE ratings don't mean much.
What do the results show?


Granted, Mok had one of the worst tournaments since he was 17 (maybe). But did he fall to almost all the juniors? He did lose to Yit San and Sumant. The other juniors who were playing were Zhuo Ren, Jun Feng, an Yit Ho. Well, Mok did not beat any of them, but he most certainly did not fall to MOST of them. Losing to two out of five is not MOST by any standards.

But this is not the point. The point is, one tournament does not show anything. Strong consistent performance across many major tournaments will show playing strength. In due time, it will be recognized in the form of FIDE rating. But that requires a time lag and sufficient number of games before a player reaches his "true rating". Imagine a player who can play at 2400-2500 strength but currently has a rating of 2100. It will probably take about one to two years before that player can achieve his "true rating". However, in the mean time, you can probably start seeing his rating performance showing 2400+ in most of the tournaments. That is why I say consistent results will reflect a "truer" picture of playing strength. But over the longer term, the rating performance will eventually translate into actual ratings. It's just something we should all bear in mind and not become like some "coffee shop uncle" speculating on the stock market chess players' strengths.

P/S: Nobody has said that ratings should be used for selection. At least not this time around. Not yet. Don't let anyone Jedi Ass-Trick you into thinking so.

PP/S: I believe Mark has not beaten me before as well. That is a game I would like to see.

2 comments:

  1. For once, I am on Raymond's side. His view on ratings as a measure of strength is more reasonable compared to Jimmy's or Chess Ninja's. A FIDE rating is reflective of current playing strength only if the rating is current. However what is "current" is debatable. FIDE uses "at least 30 rated games" in the last 12 months for qualification to World Cup.

    Under the Elo assumptions, a 'statistically true' (reflective) rating is obtained (for K=10) if there are 72 or so rated games in the relevant rating period. And 50 or so games if K=15.

    In the case of Malaysian players (IM or otherwise) who play less number of games, (average)Performance Rating in events over the last 12 months may give a reasonable reflection of current (comparative) playing strength. Maybe MCF should consider "recent" performance ratings as qualification criteria. E.g. instead of absolute ratings (which may be out-of date or even inactive).

    As an aside, an active rating may not be reflective of current playing strength as you only need to play ONE fide-rated game in the preceding 12 months to keep your rating active.

    ReplyDelete
  2. I agree with you Eddy. You have definitely explained it in a clearer manner than I have. Which is why I was talking about "true rating".

    Well, for selection purposes, as I have suggested before, many times, is to use some kind of aggregated result over a pre-determined list of notable tournaments. See below:

    "I have and will always be against using one tournament as a basis for selection. In the past, I have already suggested the use of the National Closed, KL Open, Selangor Open, Penang Open, Malaysia Open etc as tournaments for selection. This would be an equivalent of the badminton's Super Series. The finer details would be the criteria to use. One suggestion would be to take the average score of players who play in at least 3 of the listed tournaments. Anyone who plays less than 3 of those tournaments automatically disqualifies himself from selection."

    For the full version see here:
    http://thechessninja.blogspot.com/2011/01/full-almost-analysis-of-senior-vs.html

    This is more or less reflective of "rating performance". But rating performance still runs into your initial problem. It is assuming that the ratings of your opponents are "current" as you said. Which is why I proposed using a list of results, rather than just one tournament.

    Granted, no one has the time to play so many tournaments these days, but shouldn't we also reward commitment? People who are interested enough to commit the time to play that many tournaments and perform?

    Using recent tournaments also have the advantage of measuring "playing form". "Weaker" players who are in form may sometimes outperform stronger players. It is in such cases, we may prefer to select "weaker" players instead.

    ReplyDelete