Friday, September 25, 2009

GOTW - Whackometer Week 4

Here are the results for the Week 4, Game-of-the-Week Whackometer, which attempts to see whether one (or more) of the GOTW judges are "whacko".

This week, Michael said , "I confess, I am a whacko judge". Do the statistics bear him out?
Name Week-4 || Average
Jeff 0.589 || 0.388
Greg 0.017 || 0.329
Arun 0.749 || 0.492
JimD 0.897 || 0.556
Mike 0.182 || 0.199
Remember that a number closer to 1.000 means that the judge agreed with the collective wisdom of the other judges, a number closer to -1.000 means that the judge disagreed with the collective wisdom of the other judges, and a number close to 0.000 means that the judge neither agreed nor disagreed with the collective wisdom of the other judges.

This week, three of the judges (Jeff, Arun, Jim) all were in pretty high agreement with the wisdom of the group. Michael was less so, and Greg had almost no relationship in his choices with everyone else.

Overall, Michael still is somewhat less correlated with the wisdom of the group than the other judges. So, he's edging closer to whackoville, but he's not there yet. There's still a ways to go.

On another note, I was interested in seeing whether any of the judges exhibited a bias toward one team or another in picking game of the week. For example, did "Storm McRainy" award most GOTW points to Seattle, or "Jack Daniels" give lots of GOTW points to Tennessee? This can also be done with correlations.

Fortunately, the results were rather uninteresting. All judges agreed with the other judges with correlations ranging from 0.743 to 0.849. That's pretty high, and suggests there's absolutely no bias from any judge. I'll keep track of this for a few more weeks, but I doubt anything will come of it.


Von_Igelfeld said...

Just curious ... does the correlation factor include all games including the ones that don't make the cut for the first three places of GOTW?

In other words, does the correlation coefficient that you're computing represent the tendency of Mike to vote for games that ultimately become one of the three places for GOTW or does it represent the correlation of his votes for all games considered (which would presumably include other judge's selections that also didn't make the cut).

I think to make this more interesting, the judges would need to rate the games at something other than a summary kind of number. For example, if the judges rated each of the games for something like a) Artistic beauty b) Technical proficiency c) Opening novelty d) Middlegame creativity ... you could then actually perform a much more revealing analysis on the contributing factors for "whackiness". Anyway, I'm sure it's all too much to actually implement but the thought experiment is intriguing.

Bionic Lime said...

You've spotted an issue with my correlations, which I addressed in my Week 2 Whackometer, namely that I did not include the other non-point-getting games.