Monday, January 24, 2011

USCL Game of the Year Judging Analysis

I performed some statistics on the judging in United States Chess League's 2010 Game of the Year contest.

There were five judges: Hess, Gustafsson, Johannesson, Melekhina, Young. I will refer to them by the first letter of their last name.

Several analyses were completed.

What are the games for which the judges agreed most and disagreed most?

This can be calculated by looking at the standard deviations of the scores on each game.

The most agreed upon games were:
  1. #20, Sammour-Hasbun vs. Kaplan (sd = 2.51)
  2. #2, Sammour-Hasbun vs. Kacheishvili (sd = 2.61)
  3. #4, Rosen-Guo (sd = 2.97)
The most disagreed upon games were:
  1. #13 Schroer vs. Kacheishvili (sd = 7.99)
  2. #19 Galofre vs. Milat (sd = 7.80)
  3. #12 Friedel vs. Akobian (sd = 7.36)
Which judges were most different?

I calculated which of the judges were "most different" than the combined wisdom of all the judges together. The judges that were the most different could be considered outliers.

There are several ways to do this. I will demonstrate two approaches.

FINDING THE OUTLIER JUDGES

First, I compared the score a judge gave to the average of all the judges, but tempering that by the amount of disagreement of all the judges. For instance Judge Y gave 2 points (19th place) to Schroer-Kacheishvili, while the average number of points was 9.2, and the standard deviation (the amount of disagreement) was 7.99. Therefore, For that game, Judge Y would receive the absolute value of (2 - 9.2)/7.99 or 0.80 "difference points". For each of the twenty games, add up the difference points. The more the difference points, the more different the judge was from the other judges.

The total number of difference points were...
Judge Y: 17.49
Judge J: 11.09
Judge M: 19.40
Judge G: 12.80
Judge H: 16.96

Therefore, Judge Y and Judge M were the most different from the other judges.

Then, we could discard the scores of these two judges, and rescore the contest.

See below for how the results would have changed.


COMPUTE THE MIDDLE SCORES FOR EACH GAME

Another way of rescoring the contest is to do it on a "per game" basis, as opposed to throwing judges as a whole. Instead, discard the high and low scores given to each game, and create a new total.

For example, Golfre-Milat received scores of 1, 1, 1, 5, and 19. If we were to use this method, we would throw out one of the 1s and the 19, and the game would received a revised score of 7.

. . .

The table below shows the original place for each game, as well as the place it would have come it if you used the "Three Judges Only" method, or the "No Hi-Lo" method. Ties were not broken for these alternate methods.

GAME Original Three Judges No Hi-Lo
Sammour-Hasbun vs Kaplan 20 19 19
Galofre vs Milat 19 20 20
Gurevich vs Barcenilla 18 18 18
Akobian vs Friedel 17 T13-14 17
Rosenthal vs Thompson 16 T15-17 15
Krasik vs Balasubramanian 15 T13-14 16
Hungaski vs Schroer 14 T15-17 13
Schroer vs Kacheishvili 13 T15-17 14
Friedel vs Akobian 12 T11-12 12
Shulman - Felecan 11 T11-12 11
Rensch - Abrahamyan 10 T4-5 T7-10
Shankland vs Becerra 9 8 T7-10
Stripunsky vs Erenburg 8 10 T7-10
Christiansen vs Kraai 7 T6-7 T7-10
Schroer vs Christiansen 6 T4-5 4
Kacheishvili vs Shankland 5 9 T5-6
Rosen vs Guo 4 T6-7 T5-6
Shulman vs Khachiyan 3 2 2
Sammour-Hasbun vs Kacheishvili 2 3 3
Akobian vs Shulman 1 1 1


Readers are invited to make their own conclusions.