Final ranking in test set

team median90th 90th90th mean90th median mean
CM (Gestalt) 60.1 449.6 160.0 22.3 63.3
annotators sel. 67.0 97.7 63.5 21.3 31.1
annotators all 96.7 255.7 223.5 25.6 101.5
CG 123.3 694.4 578.9 38.0 313.4
AGHSSO 137.6 713.4 303.2 34.6 122.2
MEVIS 155.3 1019.5 604.3 40.9 294.1
NEMESIS 200.5 1308.6 733.0 62.7 349.9
MEDAL 262.5 1607.8 1221.6 81.9 721.2
SKJP 1230.0 3292.6 2438.5 628.4 1524.3
MFRGNK 15938.0 22946.9 15342.3 9224.7 9988.1

All values are in µm. For a more detailed explanation of the metrics and ranking methods, please also refer to the Evaluation section of this website. 
For the prizes, this gives the following for the over all ranking:

  1. Christian Marzahl (Gestalt)
  2. Chandler Gatenbee
  3. Team AGHSSO

For the prizes with code publicly available, this gives the following:

  1. Chandler Gatenbee
  2. AGHSSO
  3. NEMESIS

The row "annotators sel." indicates metrics for the distances between annotator 1 and annotator 2 in the test set, excluding landmarks with a distance larger than 115µm. The row "annotators all" provides the corresponding values without such an exclusion. 
The column "median90th" is the median of the 90th percentiles, the metric that provides the final ranking in this challenge. The column "90th90th" indicates the 90th percentile of 90th percentiles in the test set, the column "mean 90th" the mean of the 90th percentiles. "median" and "mean" are the respective median and mean across all included landmarks in the test set (with annotator distance <115µm), without aggregating within image pairs first. 
The figure below shows the distributions of 90th percentiles in the validation and test set. Test set performance is likely increased in most teams since we could exclude landmark pairs of poor quality (annotator disagreement > 115µm), which was not possible in the validation data.