NEW Unsupervised evaluation using Adjusted Mutual Information (AMI)
Task 14 Word Sense Induction & Disambiguation
|
AMI
|
System | AMI | Clusters Number |
1cl1inst | 0.186 | 89.15 |
Hermit | 0.053 | 10.78 |
Duluth-WSI-Co | 0.042 | 2.49 |
UoY | 0.042 | 11.54 |
Duluth-WSI-SVD | 0.041 | 4.15 |
Duluth-WSI | 0.041 | 4.15 |
KCDC-PCGD | 0.037 | 2.9 |
Duluth-Mix-Narrow-Gap | 0.036 | 2.42 |
KCDC-PC | 0.033 | 2.92 |
Duluth-Mix-Narrow-PK2 | 0.032 | 2.68 |
KCDC-PC-2 | 0.032 | 2.93 |
Duluth-WSI-Co-Gap | 0.027 | 1.6 |
KCDC-GD | 0.027 | 2.78 |
KCDC-GD-2 | 0.027 | 2.82 |
KCDC-GDC | 0.025 | 2.83 |
Duluth-MIX-PK2 | 0.024 | 2.66 |
Duluth-WSI-Gap | 0.018 | 1.4 |
KSU | 0.015 | 17.5 |
Duluth-Mix-Gap | 0.013 | 1.61 |
Duluth-Mix-Uni-PK2 | 0.009 | 2.04 |
KCDC-PT | 0.006 | 1.5 |
Duluth-Mix-Uni-Gap | 0.005 | 1.39 |
Duluth-R-13 | 0.001 | 3 |
Duluth-R-12 | 0.001 | 2 |
Duluth-WSI-SVD-Gap | 0.000 | 1.02 |
MFS | 0.000 | 1 |
Duluth-R-110 | -0.001 | 9.71 |
Duluth-R-15 | -0.001 | 4.97 |
|
NEW Unsupervised evaluation using BCubed
Task 14 Word Sense Induction & Disambiguation
|
BCubed on nouns
|
System | BCubed (Nouns) | Clusters Number |
MFS | 64.1 | 1 |
Duluth-WSI-SVD-Gap | 64 | 1.02 |
KCDC-PT | 63.1 | 1.5 |
KCDC-GD | 61.2 | 2.78 |
KCDC-GD-2 | 60.5 | 1.61 |
Duluth-Mix-Gap | 60.5 | 1.39 |
Duluth-Mix-Uni-Gap | 59.7 | 2.82 |
KCDC-GDC | 59.4 | 2.83 |
Duluth-Mix-Uni-PK2 | 57.9 | 2.04 |
KCDC-PC | 57.6 | 2.92 |
KCDC-PC-2 | 57 | 2.93 |
UoY | 55.7 | 1.4 |
KCDC-PCGD | 55.5 | 2.9 |
Duluth-WSI-Gap | 55 | 1.6 |
Duluth-WSI-Co-Gap | 54.2 | 2.66 |
Duluth-MIX-PK2 | 52.5 | 11.54 |
Duluth-Mix-Narrow-Gap | 51.9 | 2.42 |
Duluth-WSI-Co | 51.8 | 2.49 |
Duluth-Mix-Narrow-PK2 | 50.3 | 2.68 |
Duluth-R-12 | 49.4 | 2 |
Duluth-WSI | 44.4 | 4.15 |
Duluth-WSI-SVD | 44.4 | 4.15 |
KSU KDD | 43.3 | 3 |
Duluth-R-13 | 40.9 | 17.5 |
RANDOM | 35.18 | 4 |
Hermit | 32.1 | 4.97 |
Duluth-R-15 | 31.2 | 10.78 |
Duluth-R-110 | 21.4 | 9.71 |
1cl1inst.key | 8 | 89.15 |
|
Task 14 Word Sense Induction & Disambiguation
|
BCubed on verbs
|
System | BCubed (Verbs) | Clusters Number |
MFS | 63.4 | 1 |
Duluth-WSI-SVD-Gap | 63.3 | 1.02 |
KCDC-PT | 61.8 | 1.5 |
KCDC-GD | 59.2 | 2.78 |
Duluth-Mix-Gap | 59.1 | 1.61 |
Duluth-Mix-Uni-Gap | 58.7 | 1.39 |
KCDC-GD-2 | 58.2 | 2.82 |
KCDC-GDC | 57.3 | 2.83 |
Duluth-Mix-Uni-PK2 | 56.6 | 2.04 |
KCDC-PC | 55.5 | 2.92 |
KCDC-PC-2 | 54.7 | 2.93 |
Duluth-WSI-Gap | 53.7 | 1.4 |
KCDC-PCGD | 53.3 | 2.9 |
Duluth-WSI-Co-Gap | 52.6 | 1.6 |
Duluth-MIX-PK2 | 50.4 | 2.66 |
UoY | 49.8 | 11.54 |
Duluth-Mix-Narrow-Gap | 49.7 | 2.42 |
Duluth-WSI-Co | 49.5 | 2.49 |
Duluth-Mix-Narrow-PK2 | 47.8 | 2.68 |
Duluth-R-12 | 47.8 | 2 |
Duluth-WSI-SVD | 41.1 | 4.15 |
Duluth-WSI | 41.1 | 4.15 |
Duluth-R-13 | 38.4 | 3 |
KSU KDD | 36.9 | 17.5 |
Random | 31.9 | 4 |
Duluth-R-15 | 27.6 | 4.97 |
Hermit | 26.7 | 10.78 |
Duluth-R-110 | 16.1 | 9.71 |
1ClusterPerInstance | 0.09 | 89.15 |
|
1. Unsupervised evaluation
1.1 V-Measure
The first measure used in the unsupervised evaluation is V-Measure:
Andrew Rosenberg and Julia Hirschberg. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) Prague, Czech Republic, (June 2007). ACL.
Suresh Manandhar and Ioannis P. Klapaftis , SemEval-2010 Task 14: Evaluation Setting for Word Sense Induction \& Disambiguation Systems In NAACL-HLT 2009 Workshop on Semantic Evaluations: Recent Achievements and Future Directions , Boulder, Colorado, USA (2009).
The following Table shows the results of the evaluation. For nouns the average number of senses in the gold standard is 4.46. For verbs the average number of senses in the gold standard is 3.12. There are three baselines: (1) Most Frequent Sense (MFS), which groups all instances of a target word into one cluster, (2) 1ClusterPerInstance, which produces one cluster for each instance of a target word and (3) Random, which randomly assigns an instance to one out of four clusters. The number of clusters of the Random baseline was chosen to be roughly equal to the average number of senses in the GS for all words. The Random baseline is executed five times and the results are averaged. Note that the 1ClusterPerInstance is not included in the supervised evaluation, since the mapping process is unable to associate induced clusters with GS senses (clusters appearing in the mapping corpus do not appear in the evaluation corpus).
Task 14 Word Sense Induction & Disambiguation
|
V-Measure (VM)
|
System | VM (All) | VM (Verbs) | VM (Nouns) | Clusters Number |
1ClusterPerInstance | 31.7 | 25.6 | 35.8 | 89.15 |
Hermit | 16.2 | 15.6 | 16.7 | 10.78 |
UoY | 15.7 | 8.5 | 20.6 | 11.54 |
KSU KDD | 15.7 | 12.4 | 18.0 | 17.5 |
Duluth-WSI | 9.0 | 5.7 | 11.4 | 4.15 |
Duluth-WSI-SVD | 9.0 | 5.7 | 11.4 | 4.15 |
Duluth-R-110 | 8.6 | 8.5 | 8.6 | 9.71 |
Duluth-WSI-Co | 7.9 | 6.0 | 9.2 | 2.49 |
KCDC-PCGD | 7.8 | 8.4 | 7.3 | 2.9 |
KCDC-PC | 7.5 | 7.3 | 7.7 | 2.92 |
KCDC-PC-2 | 7.1 | 6.1 | 7.7 | 2.93 |
Duluth-Mix-Narrow-Gap | 6.9 | 5.1 | 8.0 | 2.42 |
KCDC-GD-2 | 6.9 | 8.0 | 6.1 | 2.82 |
KCDC-GD | 6.9 | 8.5 | 5.9 | 2.78 |
Duluth-Mix-Narrow-PK2 | 6.8 | 5.5 | 7.8 | 2.68 |
Duluth-MIX-PK2 | 5.6 | 5.2 | 5.8 | 2.66 |
Duluth-R-15 | 5.3 | 5.1 | 5.4 | 4.97 |
Duluth-WSI-Co-Gap | 4.8 | 3.6 | 5.6 | 1.6 |
Random | 4.4 | 4.6 | 4.2 | 4 |
Duluth-R-13 | 3.6 | 3.7 | 3.5 | 3 |
Duluth-WSI-Gap | 3.1 | 1.5 | 4.2 | 1.4 |
Duluth-Mix-Gap | 3.0 | 3.0 | 2.9 | 1.61 |
Duluth-Mix-Uni-PK2 | 2.4 | 4.7 | 0.8 | 2.04 |
Duluth-R-12 | 2.3 | 2.5 | 2.2 | 2 |
KCDC-PT | 1.9 | 3.10 | 1.0 | 1.5 |
Duluth-Mix-Uni-Gap | 1.4 | 3.0 | 0.2 | 1.39 |
KCDC-GDC | 7.0 | 7.8 | 6.2 | 2.83 |
MFS | 0.0 | 0.0 | 0.0 | 1.0 |
Duluth-WSI-SVD-Gap | 0.0 | 0.1 | 0.0 | 1.02 |
|
As can be observed, V-Measure seems to be biased towards systems generating a higher number of clusters than the number of gold standard senses. For that reason, we provide an additional evaluation as described below.
1.2 Paired F-Score
In this evaluation, the clustering problem is transformed into a classification problem, in which the target is to decide whether two instances of a target word that belong to the same cluster, also belong to the same Gold Standard (GS) class. This evaluation scheme is described in:
Artiles, Javier and Amigo, Enrique and Gonzalo, Julio. The role of named entities in web people search. EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 534--542, Singapore, ACL.
Consider the following example, where we have 3 gold standard senses for 7 target word instances, while a clustering solution has generated 2 clusters (Figure).
For Cluster 1, we can generate 10 instance pairs, i.e. AB, AD, AE, AG, BD, BE, BG, DE, DG, EG. For Cluster 2, we can generate 1 instance pair CF. In total, the clustering solution has generated 11 instance pairs.
We can measure precision and recall as well as their harmonic mean (F-Score) as follows: Precision is the fraction of generated instance pairs that also exist in a GS class to the total number of generated instance pairs. Recall is the fraction of generated instance pairs that also exist in a GS class to the total number of instance pairs that exist in all GS classes.
In the example, 5 instance pairs (AB, AD, BD, EG and CF) also exist in the GS classes. GS classes also contain a total of 5 instance pairs, i.e. AB, AD, BD from GS class 1, CF from GS class 2 and EG from GS class 3. Thus precision is equal to 5/11 and recall is equal to 5/5.
Task 14 Word Sense Induction & Disambiguation
|
Paired F-Score (FS)
|
System | FS (All) | FS (Verbs) | FS (Nouns) | Clusters Number |
MFS | 63.4 | 72.7 | 57.0 | 1 |
Duluth-WSI-SVD-Gap | 63.3 | 72.4 | 57.0 | 1.02 |
KCDC-PT | 61.8 | 69.7 | 56.4 | 1.5 |
KCDC-GD | 59.2 | 70.0 | 51.6 | 2.78 |
Duluth-Mix-Gap | 59.1 | 65.8 | 54.5 | 1.61 |
Duluth-Mix-Uni-Gap | 58.7 | 61.2 | 57.0 | 1.39 |
KCDC-GD-2 | 58.2 | 69.3 | 50.4 | 2.82 |
KCDC-GDC | 57.3 | 70.0 | 48.5 | 2.83 |
Duluth-Mix-Uni-PK2 | 56.6 | 55.9 | 57.1 | 2.04 |
KCDC-PC | 55.5 | 62.9 | 50.4 | 2.92 |
KCDC-PC-2 | 54.7 | 61.7 | 49.7 | 2.93 |
Duluth-WSI-Gap | 53.7 | 53.9 | 53.4 | 1.4 |
KCDC-PCGD | 53.3 | 65.6 | 44.8 | 2.9 |
Duluth-WSI-Co-Gap | 52.6 | 51.5 | 53.3 | 1.6 |
Duluth-MIX-PK2 | 50.4 | 48.3 | 51.7 | 2.66 |
UoY | 49.8 | 66.6 | 38.2 | 11.54 |
Duluth-Mix-Narrow-Gap | 49.7 | 51.3 | 47.4 | 2.42 |
Duluth-WSI-Co | 49.5 | 48.2 | 50.2 | 2.49 |
Duluth-Mix-Narrow-PK2 | 47.8 | 48.2 | 37.1 | 2.68 |
Duluth-R-12 | 47.8 | 52.6 | 44.3 | 2 |
Duluth-WSI-SVD | 41.1 | 46.7 | 37.1 | 4.15 |
Duluth-WSI | 41.1 | 46.7 | 37.1 | 4.15 |
Duluth-R-13 | 38.4 | 41.5 | 36.2 | 3 |
KSU KDD | 36.9 | 54.7 | 24.6 | 17.5 |
Random | 31.9 | 34.1 | 30.4 | 4 |
Duluth-R-15 | 27.6 | 28.9 | 26.7 | 4.97 |
Hermit | 26.7 | 30.1 | 24.4 | 10.78 |
Duluth-R-110 | 16.1 | 16.4 | 15.8 | 9.71 |
1ClusterPerInstance | 0.09 | 0.08 | 0.11 | 89.15 |
|
2. Supervised Evaluation
The supervised evaluation follows the supervised evaluation of the previous SemEval-2007 WSI task:
Eneko Agirre and Aitor Soroa. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the Fourth International Workshop on Semantic Evaluations, pp. 7-12, Prague, Czech Republic, (June 2007). ACL.
In this evaluation, the test set is split into a mapping and an evaluation corpus. The mapping corpus is used to map the induced senses to gold standard senses and the evaluation corpus is used to assess the performance in a WSD setting. To avoid the problems of the previous evaluation, where different splits were giving different rankings, we applied 5 random splits, and then averaged the results of systems. In these splits, 80% of the test corpus was used for mapping and 20% for evaluation. As in the unsupervised evaluation, there are two baselines, i.e. MFS and Random.
Task 14 Word Sense Induction & Disambiguation
|
Supervised Recall (SR), test set split:80% mapping, 20% evaluation
|
System | SR (All) | SR(Nouns) | SR(Verbs) | Variance (All) |
UoY | 62.44 | 59.43 | 66.82 | 1.23 |
Duluth-WSI | 60.46 | 54.66 | 68.92 | 0.77 |
Duluth-WSI-SVD | 60.46 | 54.66 | 68.92 | 0.77 |
Duluth-WSI-Co-Gap | 60.34 | 54.09 | 68.65 | 1.12 |
Duluth-WSI-Co | 60.27 | 54.68 | 67.6 | 1.07 |
Duluth-WSI-Gap | 59.81 | 54.36 | 67.76 | 0.85 |
KCDC-PC-2 | 59.76 | 54.09 | 68.04 | 0.89 |
KCDC-PC | 59.73 | 54.55 | 67.29 | 0.56 |
KCDC-PCGD | 59.53 | 53.33 | 68.56 | 1.62 |
KCDC-GDC | 59.08 | 53.39 | 67.38 | 0.33 |
KCDC-GD | 59.03 | 52.97 | 67.87 | 0.62 |
KCDC-PT | 58.88 | 53.07 | 67.35 | 1.1 |
KCDC-GD-2 | 58.72 | 52.78 | 67.38 | 0.61 |
Duluth-WSI-SVD-Gap | 58.69 | 53.22 | 66.66 | 0.76 |
MFS | 58.67 | 53.22 | 66.629 | 0.75 |
Duluth-R-12 | 58.46 | 53.05 | 66.44 | 0.96 |
Hermit | 58.34 | 53.56 | 65.30 | 2.33 |
Duluth-R-13 | 58.01 | 52.27 | 66.38 | 1.53 |
Random | 57.25 | 51.45 | 65.69 | 0.07 |
Duluth-R-15 | 56.76 | 50.91 | 65.3 | 2.61 |
Duluth-Mix-Narrow-Gap | 56.63 | 48.11 | 69.06 | 0.61 |
Duluth-Mix-Narrow-PK2 | 56.15 | 47.54 | 68.7 | 0.5 |
Duluth-R-110 | 54.75 | 48.28 | 64.2 | 0.7 |
KSU KDD | 52.18 | 46.63 | 60.28 | 0.93 |
Duluth-MIX-PK2 | 51.62 | 41.1 | 66.96 | 0.57 |
Duluth-Mix-Gap | 50.61 | 40.04 | 66.02 | 0.27 |
Duluth-Mix-Uni-PK2 | 19.29 | 1.82 | 44.78 | 0.18 |
Duluth-Mix-Uni-Gap | 18.72 | 1.55 | 43.76 | 0.17 |
|
We also include another supervised evaluation, in which the split of the testing corpus is 60-40, i.e 60% for the mapping corpus and 40% for the evaluation corpus.
Task 14 Word Sense Induction & Disambiguation
|
Supervised Recall (SR), test set split:60% mapping, 40% evaluation
|
System | SR (All) | SR (Nouns) | SR (Verbs) | Variance (All) |
UoY | 61.96 | 58.62 | 66.82 | 0.47 |
Duluth-WSI-Co | 60.07 | 54.59 | 68.05 | 0.14 |
Duluth-WSI-Co-Gap | 59.51 | 53.45 | 68.33 | 0.38 |
Duluth-WSI-SVD | 59.48 | 53.45 | 68.27 | 0.38 |
Duluth-WSI | 59.48 | 53.45 | 68.27 | 0.38 |
Duluth-WSI-Gap | 59.32 | 53.19 | 68.23 | 0.52 |
KCDC-PCGD | 59.10 | 52.60 | 68.56 | 0.24 |
KCDC-PC-2 | 58.90 | 53.35 | 66.99 | 0.67 |
KCDC-PC | 58.89 | 53.58 | 66.64 | 0.52 |
KCDC-GDC | 58.29 | 52.14 | 67.26 | 1.09 |
KCDC-GD | 58.27 | 51.88 | 67.59 | 0.71 |
MFS | 58.25 | 52.45 | 66.70 | 0.3 |
KCDC-PT | 58.25 | 52.18 | 67.11 | 0.31 |
Duluth-WSI-SVD-Gap | 58.24 | 52.45 | 66.66 | 0.3 |
KCDC-GD-2 | 57.90 | 51.67 | 66.99 | 1.04 |
Duluth-R-12 | 57.72 | 51.74 | 66.42 | 0.43 |
Duluth-R-13 | 57.59 | 51.13 | 67.00 | 0.14 |
Hermit | 57.27 | 52.53 | 64.16 | 0.29 |
Duluth-R-15 | 56.53 | 49.95 | 66.11 | 0.42 |
Random | 56.52 | 50.21 | 65.73 | 0.17 |
Duluth-Mix-Narrow-Gap | 56.19 | 47.67 | 68.59 | 0.08 |
Duluth-Mix-Narrow-PK2 | 55.65 | 46.86 | 68.45 | 0.04 |
Duluth-R-110 | 53.60 | 46.70 | 63.63 | 0.44 |
Duluth-MIX-PK2 | 50.46 | 39.70 | 66.13 | 0.07 |
KSU KDD | 50.42 | 44.25 | 59.41 | 0.34 |
Duluth-Mix-Gap | 49.77 | 38.86 | 65.64 | 0.11 |
Duluth-Mix-Uni-PK2 | 19.12 | 1.77 | 44.40 | 0.14 |
Duluth-Mix-Uni-Gap | 18.91 | 1.52 | 44.23 | 0.09 |
|
Task 14 Word Sense Induction & Disambiguation
|
Number of clusters in each test set split (80-20). Systems sorted in alphabetic order.
|
System | Mapping 1 | Evaluation 1 | Mapping 2 | Evaluation 2 | Mapping 3 | Evaluation 3 | Mapping 4 | Evaluation 4 | Mapping 5 | Evaluation 5 | Average in Mapping | Average in Evaluation |
Duluth-Mix-Gap | 1.61 | 1.43 | 1.6 | 1.48 | 1.59 | 1.47 | 1.58 | 1.48 | 1.58 | 1.46 | 1.59 | 1.46 |
Duluth-Mix-Narrow-Gap | 2.39 | 2.12 | 2.41 | 2.11 | 2.42 | 2.15 | 2.42 | 2.16 | 2.41 | 2.11 | 2.41 | 2.13 |
Duluth-Mix-Narrow-PK2 | 2.65 | 2.25 | 2.64 | 2.26 | 2.63 | 2.29 | 2.67 | 2.23 | 2.66 | 2.21 | 2.65 | 2.25 |
Duluth-MIX-PK2 | 2.63 | 2.19 | 2.62 | 2.23 | 2.61 | 2.28 | 2.62 | 2.18 | 2.61 | 2.16 | 2.62 | 2.21 |
Duluth-Mix-Uni-Gap | 1.39 | 1.31 | 1.39 | 1.28 | 1.39 | 1.32 | 1.39 | 1.32 | 1.39 | 1.28 | 1.39 | 1.3 |
Duluth-Mix-Uni-PK2 | 2.02 | 1.61 | 2.02 | 1.6 | 2 | 1.71 | 2 | 1.68 | 1.99 | 1.63 | 2.01 | 1.65 |
Duluth-R-110 | 9.5 | 6.89 | 9.48 | 6.99 | 9.47 | 6.78 | 9.59 | 6.75 | 9.6 | 6.85 | 9.53 | 6.85 |
Duluth-R-12 | 2 | 1.98 | 2 | 1.98 | 2 | 1.95 | 2 | 1.96 | 2 | 1.97 | 2 | 1.97 |
Duluth-R-13 | 3 | 2.89 | 3 | 2.86 | 3 | 2.8 | 3 | 2.84 | 3 | 2.8 | 3 | 2.84 |
Duluth-R-15 | 4.93 | 4.3 | 4.95 | 4.26 | 4.92 | 4.33 | 4.95 | 4.17 | 4.97 | 4.2 | 4.94 | 4.25 |
Duluth-WSI | 4.07 | 3.35 | 4.07 | 3.32 | 4.07 | 3.38 | 4.07 | 3.38 | 4.09 | 3.34 | 4.07 | 3.35 |
Duluth-WSI-Co | 2.48 | 2.3 | 2.48 | 2.29 | 2.49 | 2.27 | 2.48 | 2.29 | 2.49 | 2.27 | 2.48 | 2.28 |
Duluth-WSI-Co-Gap | 1.59 | 1.59 | 1.59 | 1.59 | 1.6 | 1.59 | 1.6 | 1.58 | 1.6 | 1.56 | 1.6 | 1.58 |
Duluth-WSI-Gap | 1.39 | 1.39 | 1.39 | 1.39 | 1.4 | 1.38 | 1.4 | 1.38 | 1.4 | 1.38 | 1.4 | 1.38 |
Duluth-WSI-SVD | 4.07 | 3.35 | 4.07 | 3.32 | 4.07 | 3.38 | 4.07 | 3.38 | 4.09 | 3.34 | 4.07 | 3.35 |
Duluth-WSI-SVD-Gap | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 |
Hermit | 10.37 | 6.29 | 10.32 | 6.48 | 10.23 | 6.56 | 10.37 | 6.45 | 10.35 | 6.41 | 10.33 | 6.44 |
KCDC-GD | 2.69 | 1.99 | 2.72 | 2.04 | 2.7 | 2.01 | 2.66 | 2 | 2.65 | 2 | 2.68 | 2.01 |
KCDC-GD-2 | 2.69 | 2.05 | 2.77 | 1.98 | 2.74 | 1.99 | 2.71 | 2.04 | 2.68 | 1.98 | 2.72 | 2.01 |
KCDC-GDC | 2.71 | 2.01 | 2.77 | 1.97 | 2.78 | 1.96 | 2.71 | 2.02 | 2.7 | 1.93 | 2.73 | 1.98 |
KCDC-PC | 2.8 | 2.21 | 2.79 | 2.15 | 2.8 | 2.19 | 2.79 | 2.21 | 2.81 | 2.11 | 2.8 | 2.17 |
KCDC-PC-2 | 2.81 | 2.2 | 2.84 | 2.21 | 2.81 | 2.33 | 2.84 | 2.18 | 2.78 | 2.22 | 2.82 | 2.23 |
KCDC-PCGD | 2.87 | 2.29 | 2.83 | 2.35 | 2.88 | 2.28 | 2.8 | 2.32 | 2.83 | 2.21 | 2.84 | 2.29 |
KCDC-PT | 1.47 | 1.22 | 1.45 | 1.24 | 1.45 | 1.26 | 1.44 | 1.27 | 1.45 | 1.21 | 1.45 | 1.24 |
KSU KDD | 15.44 | 6.51 | 15.34 | 6.58 | 15.4 | 6.51 | 15.28 | 6.85 | 15.38 | 6.68 | 15.37 | 6.63 |
UoY | 10.28 | 4.34 | 10.13 | 4.63 | 10.15 | 4.45 | 10.12 | 4.6 | 10.28 | 4.66 | 10.19 | 4.54 |
|
Task 14 Word Sense Induction & Disambiguation
|
Number of clusters in each test set split (60-40). Systems sorted in alphabetic order.
|
System | Mapping 1 | Evaluation 1 | Mapping 2 | Evaluation 2 | Mapping 3 | Evaluation 3 | Mapping 4 | Evaluation 4 | Mapping 5 | Evaluation 5 | Average in Mapping | Average in Evaluation |
Duluth-Mix-Gap | 1.58 | 1.55 | 1.6 | 1.53 | 1.58 | 1.53 | 1.55 | 1.58 | 1.59 | 1.52 | 1.58 | 1.54 |
Duluth-Mix-Narrow-Gap | 2.38 | 2.3 | 2.39 | 2.34 | 2.37 | 2.29 | 2.36 | 2.33 | 2.39 | 2.28 | 2.38 | 2.31 |
Duluth-Mix-Narrow-PK2 | 2.62 | 2.45 | 2.59 | 2.52 | 2.6 | 2.4 | 2.58 | 2.48 | 2.59 | 2.46 | 2.6 | 2.46 |
Duluth-MIX-PK2 | 2.52 | 2.46 | 2.6 | 2.41 | 2.55 | 2.39 | 2.53 | 2.45 | 2.55 | 2.43 | 2.55 | 2.43 |
Duluth-Mix-Uni-Gap | 1.38 | 1.34 | 1.37 | 1.34 | 1.36 | 1.34 | 1.35 | 1.36 | 1.38 | 1.34 | 1.37 | 1.34 |
Duluth-Mix-Uni-PK2 | 1.93 | 1.87 | 1.94 | 1.85 | 1.93 | 1.84 | 1.91 | 1.88 | 1.96 | 1.82 | 1.93 | 1.85 |
Duluth-R-110 | 9.21 | 8.58 | 9.18 | 8.53 | 9.15 | 8.43 | 9.18 | 8.57 | 9.2 | 8.53 | 9.18 | 8.53 |
Duluth-R-12 | 2 | 1.99 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Duluth-R-13 | 3 | 2.98 | 3 | 2.97 | 2.99 | 2.99 | 2.98 | 2.97 | 3 | 3 | 2.99 | 2.98 |
Duluth-R-15 | 4.86 | 4.75 | 4.86 | 4.73 | 4.9 | 4.8 | 4.91 | 4.78 | 4.89 | 4.75 | 4.88 | 4.76 |
Duluth-WSI | 3.94 | 3.71 | 3.94 | 3.78 | 3.95 | 3.82 | 4 | 3.74 | 3.94 | 3.71 | 3.95 | 3.75 |
Duluth-WSI-Co | 2.46 | 2.46 | 2.47 | 2.46 | 2.48 | 2.41 | 2.47 | 2.47 | 2.47 | 2.43 | 2.47 | 2.45 |
Duluth-WSI-Co-Gap | 1.6 | 1.59 | 1.59 | 1.6 | 1.6 | 1.59 | 1.59 | 1.6 | 1.59 | 1.6 | 1.59 | 1.6 |
Duluth-WSI-Gap | 1.4 | 1.38 | 1.39 | 1.39 | 1.4 | 1.38 | 1.39 | 1.39 | 1.39 | 1.39 | 1.39 | 1.39 |
Duluth-WSI-SVD | 3.94 | 3.71 | 3.94 | 3.78 | 3.95 | 3.82 | 4 | 3.74 | 3.94 | 3.71 | 3.95 | 3.75 |
Duluth-WSI-SVD-Gap | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 | 1.02 |
Hermit | 9.63 | 8.5 | 9.66 | 8.53 | 9.66 | 8.68 | 9.59 | 8.62 | 9.5 | 8.62 | 9.61 | 8.59 |
KCDC-GD | 2.53 | 2.37 | 2.49 | 2.44 | 2.53 | 2.29 | 2.57 | 2.3 | 2.55 | 2.33 | 2.53 | 2.35 |
KCDC-GD-2 | 2.54 | 2.39 | 2.53 | 2.46 | 2.56 | 2.39 | 2.6 | 2.33 | 2.58 | 2.37 | 2.56 | 2.39 |
KCDC-GDC | 2.52 | 2.36 | 2.5 | 2.48 | 2.58 | 2.41 | 2.61 | 2.35 | 2.55 | 2.37 | 2.55 | 2.39 |
KCDC-PC | 2.62 | 2.57 | 2.66 | 2.44 | 2.66 | 2.47 | 2.69 | 2.46 | 2.69 | 2.48 | 2.66 | 2.48 |
KCDC-PC-2 | 2.72 | 2.58 | 2.76 | 2.55 | 2.72 | 2.53 | 2.64 | 2.59 | 2.78 | 2.51 | 2.72 | 2.55 |
KCDC-PCGD | 2.73 | 2.6 | 2.77 | 2.67 | 2.75 | 2.63 | 2.7 | 2.59 | 2.77 | 2.57 | 2.74 | 2.61 |
KCDC-PT | 1.38 | 1.37 | 1.42 | 1.32 | 1.39 | 1.34 | 1.39 | 1.33 | 1.45 | 1.27 | 1.41 | 1.33 |
KSU KDD | 13.07 | 10.12 | 13.08 | 10.21 | 13.17 | 10.1 | 12.93 | 10.21 | 12.75 | 10.49 | 13 | 10.23 |
UoY | 8.7 | 6.75 | 8.86 | 6.65 | 8.5 | 6.8 | 8.79 | 6.67 | 8.51 | 6.93 | 8.67 | 6.76 |
|
Task 14 Word Sense Induction & Disambiguation
|
Number of identified GS senses in the supervised key file of each test set split (80-20). Systems sorted in alphabetic order.
|
System | Split 1 | Split 2 | Split 3 | Split 4 | Split 5 | Average |
Duluth-Mix-Gap | 1 | 1.01 | 1.01 | 1.03 | 1 | 1.01 |
Duluth-Mix-Narrow-Gap | 1.4 | 1.45 | 1.39 | 1.44 | 1.46 | 1.43 |
Duluth-Mix-Narrow-PK2 | 1.42 | 1.42 | 1.38 | 1.44 | 1.41 | 1.41 |
Duluth-MIX-PK2 | 1.22 | 1.26 | 1.25 | 1.24 | 1.19 | 1.23 |
Duluth-Mix-Uni-Gap | 0.57 | 0.56 | 0.54 | 0.59 | 0.56 | 0.56 |
Duluth-Mix-Uni-PK2 | 0.61 | 0.6 | 0.62 | 0.66 | 0.61 | 0.62 |
Duluth-R-110 | 2.04 | 1.91 | 1.92 | 1.89 | 1.95 | 1.94 |
Duluth-R-12 | 1.25 | 1.28 | 1.22 | 1.25 | 1.26 | 1.25 |
Duluth-R-13 | 1.41 | 1.49 | 1.48 | 1.46 | 1.44 | 1.46 |
Duluth-R-15 | 1.6 | 1.59 | 1.65 | 1.59 | 1.63 | 1.61 |
Duluth-WSI | 1.68 | 1.66 | 1.65 | 1.69 | 1.62 | 1.66 |
Duluth-WSI-Co | 1.51 | 1.56 | 1.47 | 1.52 | 1.47 | 1.51 |
Duluth-WSI-Co-Gap | 1.18 | 1.2 | 1.2 | 1.19 | 1.17 | 1.19 |
Duluth-WSI-Gap | 1.11 | 1.12 | 1.11 | 1.11 | 1.1 | 1.11 |
Duluth-WSI-SVD | 1.68 | 1.66 | 1.65 | 1.69 | 1.62 | 1.66 |
Duluth-WSI-SVD-Gap | 1.02 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 |
Hermit | 2.05 | 2.11 | 1.98 | 2.11 | 2.06 | 2.06 |
KCDC-GD | 1.32 | 1.34 | 1.32 | 1.32 | 1.34 | 1.33 |
KCDC-GD-2 | 1.33 | 1.33 | 1.29 | 1.38 | 1.34 | 1.33 |
KCDC-GDC | 1.36 | 1.35 | 1.31 | 1.36 | 1.31 | 1.34 |
KCDC-PC | 1.37 | 1.4 | 1.39 | 1.41 | 1.37 | 1.39 |
KCDC-PC-2 | 1.22 | 1.2 | 1.2 | 1.21 | 1.21 | 1.21 |
KCDC-PCGD | 1.49 | 1.54 | 1.42 | 1.43 | 1.47 | 1.47 |
KCDC-PT | 1.08 | 1.09 | 1.11 | 1.07 | 1.07 | 1.08 |
KSU KDD | 1.67 | 1.73 | 1.72 | 1.68 | 1.65 | 1.69 |
MFS | 1 | 1 | 1 | 1 | 1 | 1 |
Random | 1.53 | 1.53 | 1.51 | 1.53 | 1.55 | 1.53 |
UoY | 1.38 | 1.5 | 1.61 | 1.56 | 1.49 | 1.51 |
|
Task 14 Word Sense Induction & Disambiguation
|
Number of identified GS senses in the supervised key file of each test set split (60-40). Systems sorted in alphabetic order.
|
System | Split 1 | Split 2 | Split 3 | Split 4 | Split 5 | Average |
Duluth-Mix-Gap | 1.04 | 1.02 | 0.99 | 1.11 | 1.03 | 1.04 |
Duluth-Mix-Narrow-Gap | 1.46 | 1.51 | 1.46 | 1.59 | 1.51 | 1.51 |
Duluth-Mix-Narrow-PK2 | 1.48 | 1.53 | 1.45 | 1.61 | 1.46 | 1.51 |
Duluth-MIX-PK2 | 1.27 | 1.31 | 1.28 | 1.41 | 1.3 | 1.31 |
Duluth-Mix-Uni-Gap | 0.58 | 0.54 | 0.53 | 0.58 | 0.59 | 0.56 |
Duluth-Mix-Uni-PK2 | 0.64 | 0.61 | 0.63 | 0.63 | 0.66 | 0.63 |
Duluth-R-110 | 2.18 | 2.18 | 2.17 | 2.19 | 2.2 | 2.18 |
Duluth-R-12 | 1.27 | 1.28 | 1.22 | 1.27 | 1.3 | 1.27 |
Duluth-R-13 | 1.52 | 1.47 | 1.42 | 1.52 | 1.47 | 1.48 |
Duluth-R-15 | 1.79 | 1.73 | 1.81 | 1.74 | 1.71 | 1.76 |
Duluth-WSI | 1.72 | 1.73 | 1.7 | 1.76 | 1.74 | 1.73 |
Duluth-WSI-Co | 1.58 | 1.52 | 1.53 | 1.65 | 1.54 | 1.56 |
Duluth-WSI-Co-Gap | 1.19 | 1.22 | 1.19 | 1.21 | 1.21 | 1.2 |
Duluth-WSI-Gap | 1.12 | 1.11 | 1.11 | 1.12 | 1.11 | 1.11 |
Duluth-WSI-SVD | 1.72 | 1.73 | 1.7 | 1.76 | 1.74 | 1.73 |
Duluth-WSI-SVD-Gap | 1 | 1.02 | 1.01 | 1.01 | 1.01 | 1.01 |
Hermit | 2.22 | 2.33 | 2.27 | 2.25 | 2.3 | 2.27 |
KCDC-GD | 1.46 | 1.43 | 1.34 | 1.44 | 1.44 | 1.42 |
KCDC-GD-2 | 1.46 | 1.41 | 1.42 | 1.45 | 1.47 | 1.44 |
KCDC-GDC | 1.44 | 1.37 | 1.4 | 1.44 | 1.4 | 1.41 |
KCDC-PC | 1.5 | 1.29 | 1.45 | 1.51 | 1.46 | 1.44 |
KCDC-PC-2 | 1.25 | 1.25 | 1.28 | 1.28 | 1.2 | 1.25 |
KCDC-PCGD | 1.55 | 1.57 | 1.5 | 1.56 | 1.53 | 1.54 |
KCDC-PT | 1.1 | 1.14 | 1.12 | 1.1 | 1.11 | 1.11 |
KSU KDD | 1.89 | 1.88 | 2.01 | 1.88 | 1.95 | 1.92 |
MFS | 1 | 1 | 1 | 1 | 1 | 1 |
Random | 1.67 | 1.64 | 1.65 | 1.66 | 1.6 | 1.65 |
UoY | 1.77 | 1.6 | 1.63 | 1.69 | 1.63 | 1.66 |
|