Final ranking in test set

teammedian90th90th90thmean90thmedianmean
CM (Gestalt)60.1449.6160.022.363.3
annotators sel.67.097.763.521.331.1
annotators all96.7255.7223.525.6101.5
CG123.3694.4578.938.0313.4
AGHSSO137.6713.4303.234.6122.2
MEVIS155.31019.5604.340.9294.1
NEMESIS200.51308.6733.062.7349.9
MEDAL262.51607.81221.681.9721.2
SKJP1230.03292.62438.5628.41524.3
MFRGNK15938.022946.915342.39224.79988.1

All values are in µm. For a more detailed explanation of the metrics and ranking methods, please also refer to the Evaluation section of this website. 

For the prizes, this gives the following for the over all ranking:

1. Christian Marzahl (Gestalt)
2. Chandler Gatenbee
3. Team AGHSSO

For the prizes with code publicly available, this gives the following:

1. Chandler Gatenbee
2. AGHSSO
3. NEMESIS

The row "annotators sel." indicates metrics for the distances between annotator 1 and annotator 2 in the test set, excluding landmarks with a distance larger than 115µm. The row "annotators all" provides the corresponding values without such an exclusion. 

The column "median90th" is the median of the 90th percentiles, the metric that provides the final ranking in this challenge. The column "90th90th" indicates the 90th percentile of 90th percentiles in the test set, the column "mean 90th" the mean of the 90th percentiles. "median" and "mean" are the respective median and mean across all included landmarks in the test set (with annotator distance <115µm), without aggregating within image pairs first. 

The figure below shows the distributions of 90th percentiles in the validation and test set. Test set performance is likely increased in most teams since we could exclude landmark pairs of poor quality (annotator disagreement > 115µm), which was not possible in the validation data.