Showing posts with label game of the week. Show all posts
Showing posts with label game of the week. Show all posts

Friday, September 18, 2009

GOTW Whackometer - Week 3

Here are the results for the Week 3, Game-of-the-Week Whackometer, which attempts to see whether one (or more) of the GOTW judges are "whacko".
Name Week-1 Week-2 Week-3 Average
Jeff 0.792 0.319 -0.149 0.321
Greg 0.471 0.307 0.522 0.433
Arun 0.493 0.563 0.164 0.407
JimD 0.493 0.505 0.329 0.442
Mike 0.661 -0.058 0.009 0.204
Remember that a number closer to 1.000 means that the judge agreed with the collective wisdom of the other judges, a number closer to -1.000 means that the judge disagreed with the collective wisdom of the other judges, and a number close to 0.000 means that the judge neither agreed nor disagreed with the collective wisdom of the other judges.

This past week, the results were pretty varied. Greg agreed most with the other judges, while Jeff somewhat disagreed with them. Michael, once again, was independent of them.

For all three weeks, however, each judge still has a positive number, indicating that (overall) each judge agrees with the other judges to some degree.

Thus, so far, no whacko.

Sunday, September 13, 2009

GOTW Whackometer, Part 2

Well, I woke up yesterday morning, with a sick feeling that the co-linearity problem was actually a real problem, given the population size. So, I redid the correlations, but this time correlated a judges score with the sum of the points of the other judges. In this way, a judge's own points are not considered in the correlation.

The results are rather interesting.

Name Week-1 Week-2 Average
Jeff 0.792 0.319 0.556
Greg 0.471 0.307 0.389
Arun 0.493 0.563 0.528
JimD 0.493 0.505 0.499
Mike 0.661 -0.058 0.302

Mike's correlation in week 2 is pretty close to 0, which suggests that he had almost no correlation with the other judges. However, in both weeks, all the judges are reasonably similar in their correlations. Two weeks is not a very good sample.

HOWEVER, note that this is for the games that were selected. Remember that there are about 20 other games, that ALL judges marked as 0. Including these games would lead to a much higher correlation for everyone.

Friday, September 11, 2009

Game of the Week Controversy

Only week 2 in the United States Chess League, and there's already controversy about the "Game-of-the-Week" judging. It appears that several members of the Boston Blitz are upset that their team member, Marc Esserman, only came in second place in the voting.

There have been claims of a "whacko judge". But how can we define "whacko"? Quantitatively, I mean...

One method is by performing standard Pearson correlations on an individual judge's scores with the total scores of all the judges. Yes, there's some co-linearity issues, but I'm not really going to worry about that.

Correlations produce a score called "r", which ranges from -1 to 1. When r=1, it means that one set of data lines up exactly with another set of data. For example, the ordered data set {1, 2, 3} and {2, 4, 6} have an r=1. On the flip side, an r =-1 means that one data set is exactly opposite (in terms of lining up) with another. Take the data set {1, 2, 3} again; the data set {6, 4, 2} has an r=-1, because 1 and 6, 2 and 4, 3 and 2, line up in the opposite order.

So what does an r=0 mean? It means that the two data sets have absolutely no statistical correlation at all, either negative or positive; it is like there are two random data sets.

In terms of USCL GOTW judging, what we would like to see is some sort of positive correlation between each judge's scores and the scores given to the games as a whole. If there is some sort of "whacko" judge, then they should have a near zero or a negative correlation. Indeed, even if one judge has a noticeably lower positive correlation than the other judges, it something of which to keep track.

For Week 2, GOTW judging, I calculated the correlations between a judge's score and the tally of all the judges. Before I go into the numbers, let me first give an example.

Winner / Total Points / Score of "Greg"
Friedel / 18 / 2
Esserman / 14 / 5
Ehlvest / 9 / 0
Perelshteyn / 9 / 3
Charbonneau / 6 / 0
Krasik / 6 / 4
Lopez / 4 / 0
Zaremba / 3 / 0
Becerra / 3 / 0
Matlin / 1 / 0
Recio / 1 / 1
Altounian-Burnett / 1 / 0

The Pearson correlation ("r") between Greg's scores and the scores of all the judges was 0.594, which is a high positive number, which indicates that his choices matched reasonably well with the choices of the group as a whole.

Let's look at the correlations of all the judges of week 2, from highest (i.e., agreed closest with the group) to lowest.

Week 2 Correlations
Arun: 0.770
Jim: 0.733
Jeff: 0.604
Greg: 0.594
Michael: 0.280

Michael's score is well-below the other judges, but still positive, which means that he did agree somewhat with all the other judges.

So, let's go back and see what happened for the Week 1 voting, where the results were apparently less controversial.

Week 1 Correlations
Jeff: 0.882
Michael: 0.799
Arun: 0.683
Jim: 0.683
Greg: 0.667

These are all reasonably close, when Jeff and Michael having a better correlation.

So to summarize, we can determine the average correlation for both weeks.

Average Correlation for both weeks
Jeff: 0.743
Arun: 0.727
Jim: 0.708
Greg: 0.631
Michael: 0.540

All are reasonably high, so I think it may be too early to refer to a "whacko" judge.