NEW Unsupervised evaluation using Adjusted Mutual Information (AMI)

Task 14 Word Sense Induction & Disambiguation
AMI
System AMI Clusters Number
1cl1inst 0.186 89.15
Hermit 0.053 10.78
Duluth-WSI-Co 0.042 2.49
UoY 0.042 11.54
Duluth-WSI-SVD 0.041 4.15
Duluth-WSI 0.041 4.15
KCDC-PCGD 0.037 2.9
Duluth-Mix-Narrow-Gap 0.036 2.42
KCDC-PC 0.033 2.92
Duluth-Mix-Narrow-PK2 0.032 2.68
KCDC-PC-2 0.032 2.93
Duluth-WSI-Co-Gap 0.027 1.6
KCDC-GD 0.027 2.78
KCDC-GD-2 0.027 2.82
KCDC-GDC 0.025 2.83
Duluth-MIX-PK2 0.024 2.66
Duluth-WSI-Gap 0.018 1.4
KSU 0.015 17.5
Duluth-Mix-Gap 0.013 1.61
Duluth-Mix-Uni-PK2 0.009 2.04
KCDC-PT 0.006 1.5
Duluth-Mix-Uni-Gap 0.005 1.39
Duluth-R-13 0.001 3
Duluth-R-12 0.001 2
Duluth-WSI-SVD-Gap 0.000 1.02
MFS 0.000 1
Duluth-R-110 -0.001 9.71
Duluth-R-15 -0.001 4.97



NEW Unsupervised evaluation using BCubed

Task 14 Word Sense Induction & Disambiguation
BCubed on nouns
System BCubed (Nouns) Clusters Number
MFS 64.1 1
Duluth-WSI-SVD-Gap 64 1.02
KCDC-PT 63.1 1.5
KCDC-GD 61.2 2.78
KCDC-GD-2 60.5 1.61
Duluth-Mix-Gap 60.5 1.39
Duluth-Mix-Uni-Gap 59.7 2.82
KCDC-GDC 59.4 2.83
Duluth-Mix-Uni-PK2 57.9 2.04
KCDC-PC 57.6 2.92
KCDC-PC-2 57 2.93
UoY 55.7 1.4
KCDC-PCGD 55.5 2.9
Duluth-WSI-Gap 55 1.6
Duluth-WSI-Co-Gap 54.2 2.66
Duluth-MIX-PK2 52.5 11.54
Duluth-Mix-Narrow-Gap 51.9 2.42
Duluth-WSI-Co 51.8 2.49
Duluth-Mix-Narrow-PK2 50.3 2.68
Duluth-R-12 49.4 2
Duluth-WSI 44.4 4.15
Duluth-WSI-SVD 44.4 4.15
KSU KDD 43.3 3
Duluth-R-13 40.9 17.5
RANDOM 35.18 4
Hermit 32.1 4.97
Duluth-R-15 31.2 10.78
Duluth-R-110 21.4 9.71
1cl1inst.key 8 89.15



Task 14 Word Sense Induction & Disambiguation
BCubed on verbs
System BCubed (Verbs) Clusters Number
MFS 63.4 1
Duluth-WSI-SVD-Gap 63.3 1.02
KCDC-PT 61.8 1.5
KCDC-GD 59.2 2.78
Duluth-Mix-Gap 59.1 1.61
Duluth-Mix-Uni-Gap 58.7 1.39
KCDC-GD-2 58.2 2.82
KCDC-GDC 57.3 2.83
Duluth-Mix-Uni-PK2 56.6 2.04
KCDC-PC 55.5 2.92
KCDC-PC-2 54.7 2.93
Duluth-WSI-Gap 53.7 1.4
KCDC-PCGD 53.3 2.9
Duluth-WSI-Co-Gap 52.6 1.6
Duluth-MIX-PK2 50.4 2.66
UoY 49.8 11.54
Duluth-Mix-Narrow-Gap 49.7 2.42
Duluth-WSI-Co 49.5 2.49
Duluth-Mix-Narrow-PK2 47.8 2.68
Duluth-R-12 47.8 2
Duluth-WSI-SVD 41.1 4.15
Duluth-WSI 41.1 4.15
Duluth-R-13 38.4 3
KSU KDD 36.9 17.5
Random 31.9 4
Duluth-R-15 27.6 4.97
Hermit 26.7 10.78
Duluth-R-110 16.1 9.71
1ClusterPerInstance 0.09 89.15



1. Unsupervised evaluation

1.1 V-Measure

The first measure used in the unsupervised evaluation is V-Measure:

Andrew Rosenberg and Julia Hirschberg. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) Prague, Czech Republic, (June 2007). ACL.

Suresh Manandhar and Ioannis P. Klapaftis , SemEval-2010 Task 14: Evaluation Setting for Word Sense Induction \& Disambiguation Systems In NAACL-HLT 2009 Workshop on Semantic Evaluations: Recent Achievements and Future Directions , Boulder, Colorado, USA (2009).

The following Table shows the results of the evaluation. For nouns the average number of senses in the gold standard is 4.46. For verbs the average number of senses in the gold standard is 3.12. There are three baselines: (1) Most Frequent Sense (MFS), which groups all instances of a target word into one cluster, (2) 1ClusterPerInstance, which produces one cluster for each instance of a target word and (3) Random, which randomly assigns an instance to one out of four clusters. The number of clusters of the Random baseline was chosen to be roughly equal to the average number of senses in the GS for all words. The Random baseline is executed five times and the results are averaged. Note that the 1ClusterPerInstance is not included in the supervised evaluation, since the mapping process is unable to associate induced clusters with GS senses (clusters appearing in the mapping corpus do not appear in the evaluation corpus).

Task 14 Word Sense Induction & Disambiguation
V-Measure (VM)
System VM (All) VM (Verbs) VM (Nouns) Clusters Number
1ClusterPerInstance 31.7 25.6 35.8 89.15
Hermit 16.2 15.6 16.7 10.78
UoY 15.7 8.5 20.6 11.54
KSU KDD 15.7 12.4 18.0 17.5
Duluth-WSI 9.0 5.7 11.4 4.15
Duluth-WSI-SVD 9.0 5.7 11.4 4.15
Duluth-R-110 8.6 8.5 8.6 9.71
Duluth-WSI-Co 7.9 6.0 9.2 2.49
KCDC-PCGD 7.8 8.4 7.3 2.9
KCDC-PC 7.5 7.3 7.7 2.92
KCDC-PC-2 7.1 6.1 7.7 2.93
Duluth-Mix-Narrow-Gap 6.9 5.1 8.0 2.42
KCDC-GD-2 6.9 8.0 6.1 2.82
KCDC-GD 6.9 8.5 5.9 2.78
Duluth-Mix-Narrow-PK2 6.8 5.5 7.8 2.68
Duluth-MIX-PK2 5.6 5.2 5.8 2.66
Duluth-R-15 5.3 5.1 5.4 4.97
Duluth-WSI-Co-Gap 4.8 3.6 5.6 1.6
Random 4.4 4.6 4.2 4
Duluth-R-13 3.6 3.7 3.5 3
Duluth-WSI-Gap 3.1 1.5 4.2 1.4
Duluth-Mix-Gap 3.0 3.0 2.9 1.61
Duluth-Mix-Uni-PK2 2.4 4.7 0.8 2.04
Duluth-R-12 2.3 2.5 2.2 2
KCDC-PT 1.9 3.10 1.0 1.5
Duluth-Mix-Uni-Gap 1.4 3.0 0.2 1.39
KCDC-GDC 7.0 7.8 6.2 2.83
MFS 0.0 0.0 0.0 1.0
Duluth-WSI-SVD-Gap 0.0 0.1 0.0 1.02



As can be observed, V-Measure seems to be biased towards systems generating a higher number of clusters than the number of gold standard senses. For that reason, we provide an additional evaluation as described below.

1.2 Paired F-Score

In this evaluation, the clustering problem is transformed into a classification problem, in which the target is to decide whether two instances of a target word that belong to the same cluster, also belong to the same Gold Standard (GS) class. This evaluation scheme is described in:

Artiles, Javier and Amigo, Enrique and Gonzalo, Julio. The role of named entities in web people search. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 534--542, Singapore, ACL.

Consider the following example, where we have 3 gold standard senses for 7 target word instances, while a clustering solution has generated 2 clusters (Figure).

Office Map

For Cluster 1, we can generate 10 instance pairs, i.e. AB, AD, AE, AG, BD, BE, BG, DE, DG, EG. For Cluster 2, we can generate 1 instance pair CF. In total, the clustering solution has generated 11 instance pairs.

We can measure precision and recall as well as their harmonic mean (F-Score) as follows: Precision is the fraction of generated instance pairs that also exist in a GS class to the total number of generated instance pairs. Recall is the fraction of generated instance pairs that also exist in a GS class to the total number of instance pairs that exist in all GS classes.

In the example, 5 instance pairs (AB, AD, BD, EG and CF) also exist in the GS classes. GS classes also contain a total of 5 instance pairs, i.e. AB, AD, BD from GS class 1, CF from GS class 2 and EG from GS class 3. Thus precision is equal to 5/11 and recall is equal to 5/5.



Task 14 Word Sense Induction & Disambiguation
Paired F-Score (FS)
System FS (All) FS (Verbs) FS (Nouns) Clusters Number
MFS63.472.757.01
Duluth-WSI-SVD-Gap63.372.457.01.02
KCDC-PT61.869.756.41.5
KCDC-GD59.270.051.62.78
Duluth-Mix-Gap59.165.854.51.61
Duluth-Mix-Uni-Gap58.761.257.01.39
KCDC-GD-258.269.350.42.82
KCDC-GDC57.370.048.52.83
Duluth-Mix-Uni-PK256.655.957.12.04
KCDC-PC55.562.950.42.92
KCDC-PC-254.761.749.72.93
Duluth-WSI-Gap53.753.953.41.4
KCDC-PCGD53.365.644.82.9
Duluth-WSI-Co-Gap52.651.553.31.6
Duluth-MIX-PK250.448.351.72.66
UoY49.866.638.211.54
Duluth-Mix-Narrow-Gap49.751.347.42.42
Duluth-WSI-Co49.548.250.22.49
Duluth-Mix-Narrow-PK247.848.237.12.68
Duluth-R-1247.852.644.32
Duluth-WSI-SVD41.146.737.14.15
Duluth-WSI41.146.737.14.15
Duluth-R-1338.441.536.23
KSU KDD36.954.724.617.5
Random31.934.130.44
Duluth-R-1527.628.926.74.97
Hermit26.730.124.410.78
Duluth-R-11016.116.415.89.71
1ClusterPerInstance 0.09 0.08 0.11 89.15

2. Supervised Evaluation

The supervised evaluation follows the supervised evaluation of the previous SemEval-2007 WSI task:

Eneko Agirre and Aitor Soroa. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the Fourth International Workshop on Semantic Evaluations, pp. 7-12, Prague, Czech Republic, (June 2007). ACL.

In this evaluation, the test set is split into a mapping and an evaluation corpus. The mapping corpus is used to map the induced senses to gold standard senses and the evaluation corpus is used to assess the performance in a WSD setting. To avoid the problems of the previous evaluation, where different splits were giving different rankings, we applied 5 random splits, and then averaged the results of systems. In these splits, 80% of the test corpus was used for mapping and 20% for evaluation. As in the unsupervised evaluation, there are two baselines, i.e. MFS and Random.

Task 14 Word Sense Induction & Disambiguation
Supervised Recall (SR), test set split:80% mapping, 20% evaluation
SystemSR (All) SR(Nouns) SR(Verbs) Variance (All)
UoY62.4459.4366.82 1.23
Duluth-WSI60.4654.6668.92 0.77
Duluth-WSI-SVD60.4654.6668.92 0.77
Duluth-WSI-Co-Gap60.3454.0968.65 1.12
Duluth-WSI-Co60.2754.6867.6 1.07
Duluth-WSI-Gap59.8154.3667.76 0.85
KCDC-PC-259.7654.0968.04 0.89
KCDC-PC59.7354.5567.29 0.56
KCDC-PCGD59.5353.3368.56 1.62
KCDC-GDC59.0853.3967.38 0.33
KCDC-GD59.0352.9767.87 0.62
KCDC-PT58.8853.0767.35 1.1
KCDC-GD-258.7252.7867.38 0.61
Duluth-WSI-SVD-Gap58.6953.2266.66 0.76
MFS58.6753.2266.629 0.75
Duluth-R-1258.4653.0566.44 0.96
Hermit58.3453.5665.30 2.33
Duluth-R-1358.0152.2766.38 1.53
Random57.2551.4565.69 0.07
Duluth-R-1556.7650.9165.3 2.61
Duluth-Mix-Narrow-Gap56.6348.1169.06 0.61
Duluth-Mix-Narrow-PK256.1547.5468.7 0.5
Duluth-R-11054.7548.2864.2 0.7
KSU KDD52.1846.6360.28 0.93
Duluth-MIX-PK251.6241.166.96 0.57
Duluth-Mix-Gap50.6140.0466.02 0.27
Duluth-Mix-Uni-PK219.291.8244.78 0.18
Duluth-Mix-Uni-Gap18.721.5543.76 0.17


We also include another supervised evaluation, in which the split of the testing corpus is 60-40, i.e 60% for the mapping corpus and 40% for the evaluation corpus.

Task 14 Word Sense Induction & Disambiguation
Supervised Recall (SR), test set split:60% mapping, 40% evaluation
SystemSR (All) SR (Nouns) SR (Verbs) Variance (All)
UoY61.9658.6266.82 0.47
Duluth-WSI-Co60.0754.5968.05 0.14
Duluth-WSI-Co-Gap59.5153.4568.33 0.38
Duluth-WSI-SVD59.4853.4568.27 0.38
Duluth-WSI59.4853.4568.27 0.38
Duluth-WSI-Gap59.3253.1968.23 0.52
KCDC-PCGD59.1052.6068.56 0.24
KCDC-PC-258.9053.3566.99 0.67
KCDC-PC58.8953.5866.64 0.52
KCDC-GDC58.2952.1467.26 1.09
KCDC-GD58.2751.8867.59 0.71
MFS58.2552.4566.70 0.3
KCDC-PT58.2552.1867.11 0.31
Duluth-WSI-SVD-Gap58.2452.4566.66 0.3
KCDC-GD-257.9051.6766.99 1.04
Duluth-R-1257.7251.7466.42 0.43
Duluth-R-1357.5951.1367.00 0.14
Hermit57.2752.5364.16 0.29
Duluth-R-1556.5349.9566.11 0.42
Random56.5250.2165.73 0.17
Duluth-Mix-Narrow-Gap56.1947.6768.59 0.08
Duluth-Mix-Narrow-PK255.6546.8668.45 0.04
Duluth-R-11053.6046.7063.63 0.44
Duluth-MIX-PK250.4639.7066.13 0.07
KSU KDD50.4244.2559.41 0.34
Duluth-Mix-Gap49.7738.8665.64 0.11
Duluth-Mix-Uni-PK219.121.7744.40 0.14
Duluth-Mix-Uni-Gap18.911.5244.23 0.09
Task 14 Word Sense Induction & Disambiguation
Number of clusters in each test set split (80-20). Systems sorted in alphabetic order.
System Mapping 1 Evaluation 1 Mapping 2 Evaluation 2 Mapping 3 Evaluation 3 Mapping 4 Evaluation 4 Mapping 5 Evaluation 5 Average in Mapping Average in Evaluation
Duluth-Mix-Gap 1.61 1.43 1.6 1.48 1.59 1.47 1.58 1.48 1.58 1.46 1.59 1.46
Duluth-Mix-Narrow-Gap 2.39 2.12 2.41 2.11 2.42 2.15 2.42 2.16 2.41 2.11 2.41 2.13
Duluth-Mix-Narrow-PK2 2.65 2.25 2.64 2.26 2.63 2.29 2.67 2.23 2.66 2.21 2.65 2.25
Duluth-MIX-PK2 2.63 2.19 2.62 2.23 2.61 2.28 2.62 2.18 2.61 2.16 2.62 2.21
Duluth-Mix-Uni-Gap 1.39 1.31 1.39 1.28 1.39 1.32 1.39 1.32 1.39 1.28 1.39 1.3
Duluth-Mix-Uni-PK2 2.02 1.61 2.02 1.6 2 1.71 2 1.68 1.99 1.63 2.01 1.65
Duluth-R-110 9.5 6.89 9.48 6.99 9.47 6.78 9.59 6.75 9.6 6.85 9.53 6.85
Duluth-R-12 2 1.98 2 1.98 2 1.95 2 1.96 2 1.97 2 1.97
Duluth-R-13 3 2.89 3 2.86 3 2.8 3 2.84 3 2.8 3 2.84
Duluth-R-15 4.93 4.3 4.95 4.26 4.92 4.33 4.95 4.17 4.97 4.2 4.94 4.25
Duluth-WSI 4.07 3.35 4.07 3.32 4.07 3.38 4.07 3.38 4.09 3.34 4.07 3.35
Duluth-WSI-Co 2.48 2.3 2.48 2.29 2.49 2.27 2.48 2.29 2.49 2.27 2.48 2.28
Duluth-WSI-Co-Gap 1.59 1.59 1.59 1.59 1.6 1.59 1.6 1.58 1.6 1.56 1.6 1.58
Duluth-WSI-Gap 1.39 1.39 1.39 1.39 1.4 1.38 1.4 1.38 1.4 1.38 1.4 1.38
Duluth-WSI-SVD 4.07 3.35 4.07 3.32 4.07 3.38 4.07 3.38 4.09 3.34 4.07 3.35
Duluth-WSI-SVD-Gap 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02
Hermit 10.37 6.29 10.32 6.48 10.23 6.56 10.37 6.45 10.35 6.41 10.33 6.44
KCDC-GD 2.69 1.99 2.72 2.04 2.7 2.01 2.66 2 2.65 2 2.68 2.01
KCDC-GD-2 2.69 2.05 2.77 1.98 2.74 1.99 2.71 2.04 2.68 1.98 2.72 2.01
KCDC-GDC 2.71 2.01 2.77 1.97 2.78 1.96 2.71 2.02 2.7 1.93 2.73 1.98
KCDC-PC 2.8 2.21 2.79 2.15 2.8 2.19 2.79 2.21 2.81 2.11 2.8 2.17
KCDC-PC-2 2.81 2.2 2.84 2.21 2.81 2.33 2.84 2.18 2.78 2.22 2.82 2.23
KCDC-PCGD 2.87 2.29 2.83 2.35 2.88 2.28 2.8 2.32 2.83 2.21 2.84 2.29
KCDC-PT 1.47 1.22 1.45 1.24 1.45 1.26 1.44 1.27 1.45 1.21 1.45 1.24
KSU KDD 15.44 6.51 15.34 6.58 15.4 6.51 15.28 6.85 15.38 6.68 15.37 6.63
UoY 10.28 4.34 10.13 4.63 10.15 4.45 10.12 4.6 10.28 4.66 10.19 4.54
Task 14 Word Sense Induction & Disambiguation
Number of clusters in each test set split (60-40). Systems sorted in alphabetic order.
System Mapping 1 Evaluation 1 Mapping 2 Evaluation 2 Mapping 3 Evaluation 3 Mapping 4 Evaluation 4 Mapping 5 Evaluation 5 Average in Mapping Average in Evaluation
Duluth-Mix-Gap 1.58 1.55 1.6 1.53 1.58 1.53 1.55 1.58 1.59 1.52 1.58 1.54
Duluth-Mix-Narrow-Gap 2.38 2.3 2.39 2.34 2.37 2.29 2.36 2.33 2.39 2.28 2.38 2.31
Duluth-Mix-Narrow-PK2 2.62 2.45 2.59 2.52 2.6 2.4 2.58 2.48 2.59 2.46 2.6 2.46
Duluth-MIX-PK2 2.52 2.46 2.6 2.41 2.55 2.39 2.53 2.45 2.55 2.43 2.55 2.43
Duluth-Mix-Uni-Gap 1.38 1.34 1.37 1.34 1.36 1.34 1.35 1.36 1.38 1.34 1.37 1.34
Duluth-Mix-Uni-PK2 1.93 1.87 1.94 1.85 1.93 1.84 1.91 1.88 1.96 1.82 1.93 1.85
Duluth-R-110 9.21 8.58 9.18 8.53 9.15 8.43 9.18 8.57 9.2 8.53 9.18 8.53
Duluth-R-12 2 1.99 2 2 2 2 2 2 2 2 2 2
Duluth-R-13 3 2.98 3 2.97 2.99 2.99 2.98 2.97 3 3 2.99 2.98
Duluth-R-15 4.86 4.75 4.86 4.73 4.9 4.8 4.91 4.78 4.89 4.75 4.88 4.76
Duluth-WSI 3.94 3.71 3.94 3.78 3.95 3.82 4 3.74 3.94 3.71 3.95 3.75
Duluth-WSI-Co 2.46 2.46 2.47 2.46 2.48 2.41 2.47 2.47 2.47 2.43 2.47 2.45
Duluth-WSI-Co-Gap 1.6 1.59 1.59 1.6 1.6 1.59 1.59 1.6 1.59 1.6 1.59 1.6
Duluth-WSI-Gap 1.4 1.38 1.39 1.39 1.4 1.38 1.39 1.39 1.39 1.39 1.39 1.39
Duluth-WSI-SVD 3.94 3.71 3.94 3.78 3.95 3.82 4 3.74 3.94 3.71 3.95 3.75
Duluth-WSI-SVD-Gap 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02
Hermit 9.63 8.5 9.66 8.53 9.66 8.68 9.59 8.62 9.5 8.62 9.61 8.59
KCDC-GD 2.53 2.37 2.49 2.44 2.53 2.29 2.57 2.3 2.55 2.33 2.53 2.35
KCDC-GD-2 2.54 2.39 2.53 2.46 2.56 2.39 2.6 2.33 2.58 2.37 2.56 2.39
KCDC-GDC 2.52 2.36 2.5 2.48 2.58 2.41 2.61 2.35 2.55 2.37 2.55 2.39
KCDC-PC 2.62 2.57 2.66 2.44 2.66 2.47 2.69 2.46 2.69 2.48 2.66 2.48
KCDC-PC-2 2.72 2.58 2.76 2.55 2.72 2.53 2.64 2.59 2.78 2.51 2.72 2.55
KCDC-PCGD 2.73 2.6 2.77 2.67 2.75 2.63 2.7 2.59 2.77 2.57 2.74 2.61
KCDC-PT 1.38 1.37 1.42 1.32 1.39 1.34 1.39 1.33 1.45 1.27 1.41 1.33
KSU KDD 13.07 10.12 13.08 10.21 13.17 10.1 12.93 10.21 12.75 10.49 13 10.23
UoY 8.7 6.75 8.86 6.65 8.5 6.8 8.79 6.67 8.51 6.93 8.67 6.76
Task 14 Word Sense Induction & Disambiguation
Number of identified GS senses in the supervised key file of each test set split (80-20). Systems sorted in alphabetic order.
System Split 1 Split 2 Split 3 Split 4 Split 5 Average
Duluth-Mix-Gap 1 1.01 1.01 1.03 1 1.01
Duluth-Mix-Narrow-Gap 1.4 1.45 1.39 1.44 1.46 1.43
Duluth-Mix-Narrow-PK2 1.42 1.42 1.38 1.44 1.41 1.41
Duluth-MIX-PK2 1.22 1.26 1.25 1.24 1.19 1.23
Duluth-Mix-Uni-Gap 0.57 0.56 0.54 0.59 0.56 0.56
Duluth-Mix-Uni-PK2 0.61 0.6 0.62 0.66 0.61 0.62
Duluth-R-110 2.04 1.91 1.92 1.89 1.95 1.94
Duluth-R-12 1.25 1.28 1.22 1.25 1.26 1.25
Duluth-R-13 1.41 1.49 1.48 1.46 1.44 1.46
Duluth-R-15 1.6 1.59 1.65 1.59 1.63 1.61
Duluth-WSI 1.68 1.66 1.65 1.69 1.62 1.66
Duluth-WSI-Co 1.51 1.56 1.47 1.52 1.47 1.51
Duluth-WSI-Co-Gap 1.18 1.2 1.2 1.19 1.17 1.19
Duluth-WSI-Gap 1.11 1.12 1.11 1.11 1.1 1.11
Duluth-WSI-SVD 1.68 1.66 1.65 1.69 1.62 1.66
Duluth-WSI-SVD-Gap 1.02 1.01 1.01 1.01 1.01 1.01
Hermit 2.05 2.11 1.98 2.11 2.06 2.06
KCDC-GD 1.32 1.34 1.32 1.32 1.34 1.33
KCDC-GD-2 1.33 1.33 1.29 1.38 1.34 1.33
KCDC-GDC 1.36 1.35 1.31 1.36 1.31 1.34
KCDC-PC 1.37 1.4 1.39 1.41 1.37 1.39
KCDC-PC-2 1.22 1.2 1.2 1.21 1.21 1.21
KCDC-PCGD 1.49 1.54 1.42 1.43 1.47 1.47
KCDC-PT 1.08 1.09 1.11 1.07 1.07 1.08
KSU KDD 1.67 1.73 1.72 1.68 1.65 1.69
MFS 1 1 1 1 1 1
Random 1.53 1.53 1.51 1.53 1.55 1.53
UoY 1.38 1.5 1.61 1.56 1.49 1.51
Task 14 Word Sense Induction & Disambiguation
Number of identified GS senses in the supervised key file of each test set split (60-40). Systems sorted in alphabetic order.
System Split 1 Split 2 Split 3 Split 4 Split 5 Average
Duluth-Mix-Gap 1.04 1.02 0.99 1.11 1.03 1.04
Duluth-Mix-Narrow-Gap 1.46 1.51 1.46 1.59 1.51 1.51
Duluth-Mix-Narrow-PK2 1.48 1.53 1.45 1.61 1.46 1.51
Duluth-MIX-PK2 1.27 1.31 1.28 1.41 1.3 1.31
Duluth-Mix-Uni-Gap 0.58 0.54 0.53 0.58 0.59 0.56
Duluth-Mix-Uni-PK2 0.64 0.61 0.63 0.63 0.66 0.63
Duluth-R-110 2.18 2.18 2.17 2.19 2.2 2.18
Duluth-R-12 1.27 1.28 1.22 1.27 1.3 1.27
Duluth-R-13 1.52 1.47 1.42 1.52 1.47 1.48
Duluth-R-15 1.79 1.73 1.81 1.74 1.71 1.76
Duluth-WSI 1.72 1.73 1.7 1.76 1.74 1.73
Duluth-WSI-Co 1.58 1.52 1.53 1.65 1.54 1.56
Duluth-WSI-Co-Gap 1.19 1.22 1.19 1.21 1.21 1.2
Duluth-WSI-Gap 1.12 1.11 1.11 1.12 1.11 1.11
Duluth-WSI-SVD 1.72 1.73 1.7 1.76 1.74 1.73
Duluth-WSI-SVD-Gap 1 1.02 1.01 1.01 1.01 1.01
Hermit 2.22 2.33 2.27 2.25 2.3 2.27
KCDC-GD 1.46 1.43 1.34 1.44 1.44 1.42
KCDC-GD-2 1.46 1.41 1.42 1.45 1.47 1.44
KCDC-GDC 1.44 1.37 1.4 1.44 1.4 1.41
KCDC-PC 1.5 1.29 1.45 1.51 1.46 1.44
KCDC-PC-2 1.25 1.25 1.28 1.28 1.2 1.25
KCDC-PCGD 1.55 1.57 1.5 1.56 1.53 1.54
KCDC-PT 1.1 1.14 1.12 1.1 1.11 1.11
KSU KDD 1.89 1.88 2.01 1.88 1.95 1.92
MFS 1 1 1 1 1 1
Random 1.67 1.64 1.65 1.66 1.6 1.65
UoY 1.77 1.6 1.63 1.69 1.63 1.66