SemEval 2012:  List of Accepted Papers



SemEval-2012 Task 1: English Lexical Simplification

Lucia Specia, Sujay Kumar Jauhar and Rada Mihalcea

(Submission #140)


We describe the English Lexical Simplification task at SemEval-2012. This is the first time such a shared task has been organized and its goal is to provide a framework for the evaluation of systems for lexical simplification and foster research on context-aware lexical simplification approaches. The task requires that annotators and systems rank a number of alternative substitutes – all deemed adequate – for a target word in context, according to how “simple” these substitutes are. The notion of simplicity is biased towards non-native speakers of English. Systems can use any resource. Out of nine participating systems, the best scoring ones are those which combine context-dependent and context-independent information, with the strongest individual contribution given by the frequency of the substitute regardless of its context.

SemEval-2012 Task 2: Measuring Degrees of Relational Similarity

David Jurgens, Saif Mohammad, Peter Turney and Keith Holyoak

(Submission #172)


Up to now, work on semantic relations has focused on relation classification: recognizing whether a given instance (a word pair such as virus:flu) belongs to a specific relation class (such as CAUSE:EFFECT). However, instances of a single relation class may still have significant variability in how characteristic they are of that class. We present a new SemEval task based on identifying the degree of prototypicality for instances within a given class. As a part of the task, we have assembled the first dataset of graded relational similarity ratings across 79 relation categories. Three teams submitted six systems, which were evaluated using two methods.

SemEval-2012 Task 3: Spatial Role Labeling

parisa kordjamshidi, steven bethard and Marie-Francine Moens

(Submission #166)


This SemEval2012 shared task is based on a recently introduced spatial annotation scheme called Spatial Role Labeling. The Spatial Role Labeling task concerns the extraction of main components of the spatial semantics from nat- ural language: trajectors, landmarks and spa- tial indicators. In addition to these major components, the links between them and the general-type of spatial relationships includ- ing region, direction and distance are targeted. The annotated dataset contains about 1213 sentences which describe 612 images of the CLEF IAPR TC-12 Image Benchmark. We have one participant group with two submitted systems. The participants’ systems are com- pared to the first system for this task provided by task organizers.

SemEval-2012 Task 4: Evaluating Chinese Word Similarity

Peng Jin and Yunfang Wu

(Submission #130)


This task focus on evaluating the word simi-larity computation for Chinese language. We follow the way of Finkelstein et al. (2002) to select out word pairs. Then we organize twen-ty under-graduates who are major in Chinese linguistic to annotate the data. Each pair is as-signed a similar score by each annotator. We rank the word pairs by the aver-age value of similar scores from the twenty annotator. This rank is used as gold standard. Four systems participant in this task return their ranks. We evaluate their ranks on gold standard by Ken-dall's tau, the results shows three of them are positive correlation to the rank manually cre-ated while the taus' value is very small.

SemEval-2012 Task 5: Chinese Semantic Dependency Parsing

Wanxiang Che, Meishan Zhang, Ting Liu

(Submission #188)


The paper presents the SemEval-2012 Shared Task 5: Chinese Semantic Dependency Parsing. The goal of this task is to identify the dependency structure for a Chinese sentence in the view of semantic. We firstly introduce the motivation of providing Chinese semantic dependency parsing task. Then we describe the data annotation process. We totally annotated over ten thousand sentences for the shared task. At last, we briefly describe the submitted systems and analyze these results.

SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity

Eneko Agirre, Aitor Gonzalez-Agirre, Daniel Cer and Mona Diab

(Submission #179)


Semantic Textual Similarity (STS) measures the degree of semantic equivalence between two texts. This paper presents the results of the STS pilot task in Semeval. The train data contained cs. 2000 sentence pairs from previously existing paraphrase datasets and machine translation evaluation resources. The test data comprised a similar number of sentence pairs for those three datasets, plus two surprise datasets with ca. 400 pairs from a different machine translation evaluation corpus and 750 pairs from a lexical resource mapping exercise. The similarity of pairs of sentences was rated on a 0-5 scale (low to high similarity) by human judges using Amazon Mechanical Turk, with high Pearson correlation scores, around 90\%. 35 teams participated, submitting 88 runs. The best results scored a Pearson correlation over 80\%, well beyond a simple lexical baseline with 31\% of correlation. The metric for evaluation was not completely satisfactory, and three evaluation metrics were finally published, all of which have issues which we plan to address in the future.

SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

Andrew Gordon, Zornitsa Kozareva and Melissa Roemmele

(Submission #112)


SemEval-2012 Task 7 presented a deceptively simple challenge: given an English sentence as a premise, select the sentence amongst two alternatives that more plausibly has a causal relation to the premise. In this paper, we describe the development of this task and its motivation. We describe the two systems that competed in this task as part of SemEval-2012, and compare their results to those achieved in previously published research. We discuss the characteristics that make this task so difficult, and offer our thoughts on how progress can be made in the future.

Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization

Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli and Danilo Giampiccolo

(Submission #147)


This paper presents the first round of the task on Cross-lingual Textual Entailment for Content Synchronization, organized within SemEval-2012. The task was designed to promote research on semantic inference over texts written in different languages, targeting at the same time a real application scenario. Participants were presented with datasets for different language pairs, where multi-directional entailment relations (“forward”, “backward”, “bidirectional”, “no entailment”) had to be identified. We report on the training and test data used for evaluation, the process of their creation, the participating systems (10 teams, 92 runs), the results achieved and the approaches adopted.


WLV-SHEF: SimpLex – Lexical Simplicity Ranking based on Contextual and Psycholinguistic Features

Sujay Kumar Jauhar and Lucia Specia

(Submission #124)


This paper describes SimpLex, a Lexical Simplification system that participated in the English Lexical Simplification shared task at SemEval-2012. It operates on the basis of a linear weighted ranking function composed of context sensitive and psycholinguistic features to produce the output ranking. The system outperforms a very strong baseline, – in spite of the limited amount of training data – and ranked first on the shared task.

EMNLP@CPH: Is frequency all there is to simplicity?

Anders Johannsen, Héctor Martínez, Sigrid Klerke and Anders Søgaard

(Submission #125)


Our system breaks down the problem of ranking a list of lexical substitutions according to how simple they are in a given context into a series of pairwise comparisons between candidates. For this we learn a binary classifier. As only very little training data is provided, we describe a procedure for generating artificial unlabeled data from Wordnet and a corpus and approach the classification task as a semi-supervised machine learning problem. We use a co-training procedure that lets each classifier increase the other classifier's training set with selected instances from an unlabeled data set. Our features include n-gram probabilities of candidate and context in a web corpus, distributional differences of candidate in a corpus of ``easy'' sentences and a corpus of normal sentences, syntactical complexity of documents that are similar to the given context, candidate length, and letter-wise recognizability of candidate as measured by a trigram character language model

SB: The mmsystem for lexical simplification

Marilisa AMOIA and Massimo ROMANELLI

(Submission #133)


In this paper, we describe the system we submitted to the SemEval 2012 Lexical Simplification Task. Our system (mmsystem) combines word frequency with decompositional semantics criteria based on syntactic structure in order to rank candidate substitutes of lexical forms of arbitrary syntactic complexity (one-word, multi-word, etc.) in descending order of (cognitive) simplicity. The mmsystem achieved an average performance %(66\%) if compared with the other participating systems and the baselines. The results show that the proposed approach might help to shed light on the interplay between linguistic features and lexical complexity in general.

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

Anne-Laure Ligozat, Cyril Grouin, Anne Garcia-Fernandez and Delphine Bernhard

(Submission #137)


This paper presents the systems we developed while participating to the first task (English Lexical Simplification) of SemEval 2012. Our first system relies on n-grams frequencies computed from the Simple English Wikipedia version, ranking each substitution term by decreasing frequency of use. We experimented with several other systems, based on term frequencies, or taking into account the context in which each substitution term occurs. On the evaluation corpus, we achieved a 0.465 score with the first system.

UNT: SIMPRANK, SIMPRANKLIGHT and SALSA Systems for Lexical Simplification Ranking

Ravi Sinha

(Submission #187)


This paper presents three systems that took part in the lexical simplification task at SEMEVAL 2012. Speculating on what the concept of simplicity might mean for a word, the systems apply different approaches to rank the given candidate lists. One of the systems performs second-best (statistically significant) and another one performs third-best out of 9 systems and 3 baselines. Notably, the third- best system is very close to the second-best, and at the same time much more resource-light in comparison.


UTDRelSim: Determining Relational Similarity Using Lexical Patterns

Bryan Rink and Sanda Harabagiu

(Submission #162)


In this paper we present our approach for assigning degrees of relational similarity to pairs of words in the SemEval-2012 Task 2. To measure relational similarity we employed lexical patterns that can match against word pairs within a large corpus of 12 million documents. Patterns are weighted by obtaining statistically estimated lower bounds on their precision for extracting word pairs from a given relation. Finally, word pairs are ranked based on a model predicting the probability that they belong to the relation of interest. This approach achieved the best results on the SemEval 2012 Task 2, obtaining a Spearman correlation of 0.229 and an accuracy on reproducing human answers to MaxDiff questions of 39.4%.

Duluth : Measuring Degrees of Relational Similarity with the Gloss Vector Measure of Semantic Relatedness

Ted Pedersen

(Submission #167)


This paper describes the Duluth systems that participated in Task 2 of SemEval--2012. These systems were unsupervised and relied on variations of the Gloss Vector measure of semantic relatedness provided in WordNet::Similarity. This method was moderately successful for the Class-Inclusion (1), Similar (3), Contrast (4), and Non-Attribute (6) categories of semantic relations, but mimicked a random baseline for the other six categories.

BUAP: A First Approximation to Relational Similarity Measuring

Mireya Tovar, J. Alejandro Reyes, Azucena Montes, Darnes Vilariño, David Pinto and Saul León

(Submission #168)


We describe a system proposed for measuring the degree of relational similarity beetwen a pair of words at the Task #2 of Semeval 2012. The approach presented is based on a vectorial representation using the following features: i) the context surrounding the words with a windows size = 3, ii) knowledge extracted fromWordNet to discover several semantic relationships, as meronymy, hyponymy, hypernymy, and part-whole between pair of words, iii) the description of the pairs with their POS tag, morphological information (gender, person), and iv) the distance between each pair of words.


UTD-SpRL: A Joint Approach to Spatial Role Labeling

Kirk Roberts and Sanda Harabagiu

(Submission #160)


We present a joint approach for recognizing spatial roles in SemEval-2012 Task

3. Candidate spatial relations, in the form of triples, are heuristically extracted from sentences with high recall. The joint classification of spatial roles is then cast as a binary classification over these candidates. This joint approach allows for a rich set of features based on the complete relation instead of individual relation arguments. Our best official submission achieves an F1-measure of 0.573 on relation recognition, placing first in the task and outperforming the previous best result on the same data set (0.500).


MIXCD_System Description for Evaluating Chinese Word Similarity at SemEval-2012

Yingjie Zhang, Bin Li, Xinyu Dai and Jiajun Chen

(Submission #127)


This document describes three systems calcu-lating semantic similarity between two Chi-nese words. One is based on Machine Readable Dictionaries and the others utilize both MRDs and Corpus. These systems are performed on SemEval-2012 Task 4: Evaluat-ing Chinese Word Similarity.


Zhijun Wu: Chinese Semantic Dependency Parsing with Third-Order Features

Wu Zhijun, Li Xinxin and Wang Xuan

(Submission #111)


This paper presents our system participated on SemEval-2012 task: Chinese Semantic De-pendency Parsing. Our system extends the second-order MST model by adding two third-order features. The two third-order features are grand-sibling and tri-sibling. In the decoding phase, we keep the k best results for each span. After using the selected third-order features, our system presently achieves la-beled attachment score of 62.72% on the answered test set in our system which is 61.58% on the evaluation system of SemEval-2012, 0.15% higher than the result of purely second-order model.

Zhou qiaoli: A divide-and-conquer strategy for semantic dependency parsing

zhou qiaoli, zhang ling, liu fei, cai dongfeng and zhang guiping

(Submission #113)


We describe our SemEval2012 shared Task 5 system in this paper. The system includes tree cascaded components: the tagging semantic role phrase, the identification of semantic role phrase, phrase and frame semantic depend-ency parsing. In this paper, semantic role phrase is tagged automatically based on rules, and takes Conditional Random Fields (CRFs) as the statistical identification model of SR phrase. A projective graph-based parser is used as our semantic dependency parser. Fi-nally, we gain the overall labeled p=61.84%, which ranked the first position. At present, we gain the overall labeled macro p=62.08%, which is 0.24% higher than that ranked the first position in the task 5.

ICT:A System Combination for Chinese Semantic Dependency Parsing

Hao Xiong and Qun Liu

(Submission #131)


The goal of semantic dependency parsing is to build dependency structure and label semantic relation between a head and its modifier. To attain this goal, we concentrate on obtaining better dependency structure to predict better semantic relations, and propose a method to combine the results of three state-of-the-art dependency parsers. Unfortunately, we made a mistake when we generate the final output that results in a lower score of 55.26\% in term of Labeled Attachment Score (LAS), reported by organizers. After given golden testing set, we fix the bug and rerun the evaluation script, this time we obtain the score of 62.8\% which is consistent with the results on developing set.

NJU-Parser: Achievements on Semantic Dependency Parsing

Guangchao Tang, Bin Li, Shuaishuai Xu, Xinyu Dai and Jiajun Chen

(Submission #135)


In this paper, we introduce our work on SemEval-2012 task 5: Chinese Semantic De-pendency Parsing. Our system is based on MSTParser and two effective methods are proposed: splitting sentence by punctuations and extracting last character of word as lemma. The experiments show that, with a combination of the two proposed methods, our system can improve LAS about one percent and finally get the sencond prize out of nine participating systems. We also try to handle the multi-level labels, but with no improvement.


UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures

Daniel Bär, Chris Biemann, Iryna Gurevych and Torsten Zesch

(Submission #114)


We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

PolyUCOMP: Combining Semantic Vectors with Skip-bigrams for Semantic Textual Similarity

Jian Xu, Qin Lu and Zhengzhong Liu

(Submission #119)


This paper presents a description of the Hong Kong Polytechnic University (PolyUCOMP) system that participated in the Semantic Tex-tual Similarity task of SemEval-2012. We adopt an approach of combining semantic vectors with skip bigrams to determine sen-tence similarity. The semantic vector is used to compute similarities between sentence pairs using the lexical database WordNet and the Wikipedia corpus. Besides, we consider the impact of word order in measuring sentence similarity by means of skip-bigrams.

ETS: Discriminative Edit Models for Paraphrase Scoring

Michael Heilman and Nitin Madnani

(Submission #121)


Many problems in natural language processing can be viewed as variations of the task of measuring the semantic textual similarity between short texts. However, many systems that address these tasks focus on a single task and may or may not generalize well. In this work, we extend an existing machine translation metric, TERp (Snover et al., 2009), by adding support for more detailed feature types and by implementing a discriminative training algorithm. These additions facilitate the generalization of our system, called PERP, beyond machine translation evaluation to other similarity tasks, such as paraphrase recognition. In the SemEval 2012 Semantic Textual Similarity task, PERP performed competitively, particularly at the two surprise subtasks revealed shortly before the submission deadline.

Sbdlrhmn: A Rule-based Human Interpretation System for Semantic Textual Similarity Task

Samir AbdelRahman and Catherine Blake

(Submission #122)


In this paper, we describe our two runs as participations in SEMEVAL12, Pilot Semantic Textual Similarity Task, Task-6. The proposed runs, named a rule-based and Cosine Similarity-based, measure the degree of semantic equivalence between pair of sentences. The rule-based system utilizes sentence structure and semantics to compute the pair semantic relatedness. Its rules are a set of decisions representing the human interpretations of the sentence pair contents. As one of its main goals, the system suggests a set of domain-free rules to help the human annotator in scoring semantic equivalence of two sentences. Our Cosine Similarity-based run studies the effect of considering the sentence pair word syntaxes only on their semantic relatedness score.


LIMSI: Learning Semantic Similarity by Selecting Random Word Subsets

Artem Sokolov

(Submission #126)


We propose a semantic similarity learning method based on Random Indexing (RI) and ranking with boosting. Unlike classical RI, the method uses only those context vector features that are informative for the semantics modeled. Despite ignoring text preprocessing and dispensing with semantic resources the approach was ranked as high as 22nd among 85 participants in the SemEval-2012 Task6: Semantic Textual Similarity.

ATA-Sem: Chunk-based Determination of Semantic Text Similarity

Demetrios Glinos

(Submission #128)


This paper describes investigations into using syntactic chunk information as the basis for determining the similarity of candidate texts at the semantic level. Two approaches were con-sidered. The first was a corpus-based method that extracted lexical and semantic features from pairs of chunks from each sentence that were associated through a chunk alignment algorithm. The features were used as input to a classifier trained on the same features ex-tracted from gold standard data. The second approach involved breadth-first chunk associ-ation and the application of a rule-based scor-ing algorithm. Both approaches involved the use of NLM Lexical Tools and WordNet for term expansion. Both approaches were evalu-ated against the test data for the SemEval 2012 Semantic Text Similarity task. The re-sults show that the rule-based chunk approach is superior.

IRIT: Textual Similarity combining Conceptual Similarity with an N-Gram comparison Method

Davide Buscaldi, Ronan Tournier, Nathalie Aussenac-Gilles and Josiane Mothe

(Submission #136)


This paper describes the participation of the IRIT team to SemEval 2012 Task 6 (Semantic Textual Similarity). The method used consists of a n-gram based comparison method combined with a conceptual similarity measure that uses WordNet to calculate the similarity between a pair of concepts.

DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations

Diana McCarthy, Spandana Gella and Siva Reddy

(Submission #138)


In this paper we present our systems for the STS task. Our systems are all based on a simple process of identifying the components that correspond between two sentences. Currently we use words (that is word forms), lemmas, distributional similar words and grammatical relations identified with a dependency parser. We submitted three systems. All systems only use open class words. Our first system (alignheuristic) tries to obtain a mapping between every open class token using all the above sources of information. Our second system (wordsim) uses a different algorithm and unlike alignheuristic, it does not use the dependency information. The third system (average) simply takes the average of the scores for each item from the other two systems to take advantage of the merits of both systems. For this reason we only provide a brief description of that. The results are promising, with Pearson's coefficients on each individual dataset ranging from .3765 to .7761 for our relatively simple heuristics based systems that do not require training on different datasets. We provide some analysis of the results and also provide results for our data using Spearman's, which as a non-parametric measure which we argue is better able to reflect the merits of the different systems (average is ranked between the others).

DeepPurple: Estimating Sentence Semantic Similarity using N-gram Regression Models and Web Snippets

NIkos Malandrakis, Elias Iosif and Alexandros Potamianos

(Submission #139)


We estimate the semantic similarity between two sentences using regression models with features: 1) n-gram hit rates (lexical matches) between sentences, 2) lexical semantic similarity between non-matching words, and 3) sentence length. Lexical semantic similarity is computed via co-occurrence counts on a corpus harvested from the web using a modified mutual information metric. State-of-the-art results are obtained for semantic similarity computation at the world level, however, the fusion of this information at the sentence level provides only moderate improvement on Task 6 of SemEval'12. Despite the simple features used, regression models provide good performance, especially for shorter sentences, reaching correlation of 0.62 on the SemEval test set.

JU_CSE_NLP: Multi-grade Classification of Semantic Similarity between Text Pair

Snehasis Neogi, Partha Pakray, Sivaji Bandyopadhyay and Alexander Gelbukh

(Submission #141)


This article presents the experiments car-ried out at Jadavpur University as part of the participation in Semantic Textual Similarity (STS) of task 6 @ Semantic Evaluation Exercises (SemEval-2012). Task-6 of SemEval- 2012 focused on semantic relations of text pair. Task-6 provides five different text pair file to compare different semantic relations and judge these relations through a similarity and confidence score. Similarity score is one kind of multi way classification in the form of grade between 0 to 5. We have submitted one run for the STS task. Our system has two basic module - one deals with lexical relation and another deals with dependency based syntactic relations of the text pair. Similarity score given to a pair is the average of the scores of the above-mentioned modules. Module scores are rule based and thresholds. Total Accu-racy of our system in the task is 0.3880.

Semeval-2012 Task 6:Semantic Textual Similarity

Zhu Tiantian and Lan Man

(Submission #144)


Tiantian Zhu and Man Lan. 2012. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), in conjunction with the First Joint Conference on Lexical and Computational Semantics (*SEM 2012).

sranjans : Semantic Textual Similarity using Maximal Weighted Bipartite Graph Matching

Sumit Bhagwani, Shrutiranjan Satapathy, Harish Karnick

(Submission #146)


The paper aims to come up with a system that examines the degree of semantic equivalence between two sentences. At the core of the paper is the attempt to grade the similarity of two sentences by finding the maximal weighted bipartite match between the tokens of the two sentences. The tokens include single words, or multi-words in case of Named Entitites, adjectivally and numerically modified words. Two token similarity measures are used for the task - WordNet based similarity, and a statistical word similarity measure which overcomes the shortcomings of WordNet based similarity. As part of three systems created for the task, we explore a simple bag of words tokenization scheme, a more careful tokenization scheme which captures named entities, times, dates, monetary entities etc., and finally try to capture context around tokens using grammatical dependencies.

COL-WTMF: A Simple Unsupervised Latent Semantics based Approach for Sentence Similarity

Weiwei Guo and Mona Diab

(Submission #149)


The Semantic Textual Similarity (STS) shared task \cite{Agirre:12} computes the degree of semantic equivalence between two sentences. We show that a simple unsupervised latent semantics based approach, Weighted Textual Matrix Factorization that only exploits bag-of-words features, can outperform most systems for this task. The key to the approach is to carefully handle missing words that are not in the sentence, and thus rendering it superior to Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Our system ranks 20 out of 89 systems according to Pearson correlation, and ranks 10/89 and 19/89 in other two evaluation metrics.

UNIBA: Distributional Semantics for Textual Similarity

Annalina Caputo, Pierpaolo Basile and Giovanni Semeraro

(Submission #150)


We report the results of UNIBA participation in the first SemEval-2012 Semantic Textual Similarity task. Our system relies on distributional models of words automatically inferred from a large corpus. We exploit three different semantic word spaces: Random Indexing (RI), Latent Semantic Analysis (LSA) over RI, and vector permutations. Runs based on these spaces consistently outperform the baseline on the proposed datasets.

UNITOR: Combining Semantic Text Similarity functions through SV Regression

Danilo Croce, Paolo Annesi, Valerio Storch and Roberto Basili

(Submission #151)


This paper presents the system UNITOR that participated to the SemEval 2012 Task 6: Semantic Textual Similarity (STS). The task is here modeled as a Support Vector (SV) regression problem, where a SV regressor learns the similarity scoring function between text pairs. The semantic relatedness between sentences is estimated in an unsupervised fashion according to different similarity functions, each capturing a specific semantic aspect of the STS, e.g. syntactic vs. lexical or topical vs. paradigmatic similarity. The SV regressor effectively combines these information learning a scoring function that associates individual functions to the resulting STS. It provides a highly portable method as it does not depend on any manual built resource (e.g. WordNet) nor controlled, e.g. aligned, corpus.

Team Saarland: Vector-based models of semantic textual similarity

Georgiana Dinu and Stefan Thater

(Submission #154)


This paper describes our system for the Semeval 2012 Sentence Textual Similarity task. The system is based on a combination of few simple vector space-based methods for word meaning similarity. Evaluation results show that a simple combination of these unsupervised data-driven methods can be quite successful. The simple vector space components achieve high performance on short sentences; on longer, more complex sentences, they are outperformed by a surprisingly competitive word overlap baseline, but they still bring improvements over this baseline when incorporated into a mixture model.

UMCC_DLSI: Multidimensional Lexical-Semantic Textual Similarity

Antonio Fernández, Yoan Gutiérrez, Héctor Dávila, Alexander Chávez, Andy González, Rainel Estrada, Yenier Castañeda, Sonia Vázquez, Andrés Montoyo and Rafael Muñoz

(Submission #155)


This paper describes the specifications and results of UMCC_DLSI system, which participated in the first Semantic Textual Similarity task (STS) of SemEval-2012. Our supervised system uses different kind of semantic and lexical features to train classifiers and it uses a voting process to select the correct option. Related to the different features we can highlight the resource ISR-WN used to extract semantic relations among words and the use of different algorithms to establish semantic and lexical similarities. In order to establish which features are the most appropriate to improve STS results we participated with three runs using different set of features. Our main approache reached the positions 14, 15 and 18 for the different official rankings, obtaining a general correlation coefficient up to 0.72 for the best run.

TakeLab: Systems for Measuring Semantic Text Similarity

Frane Šaric, Goran Glavaš, Mladen Karan, Jan Šnajder and Bojana Dalbelo Bašić

(Submission #156)


This paper describes two systems for determining semantic similarity of short texts submitted to SemEval 2012 Task 6. Determining semantic similarity of texts has dominantly focused on large documents. Determining semantic similarity of two texts has dominantly focused on large documents. However, a fair amount of possibly important information is condensed into short text snippets (e.g., social media posts and comments, image captions, scientific abstracts). We combine different measures of word-overlap similarity and syntax similarity features to train the support vector regression model. Our systems placed in the top 5 out of over 80 different systems participating in the task, for all three overall evaluation metrics used (overall Pearson – 2nd and 3rd, normalized Pearson – 1st and 3rd, weighted mean – 2nd and 5th).

SRI and UBC: Simple Similarity Features for Semantic Textual Similarity

Eric Yeh and Eneko Agirre

(Submission #157)


We describe the systems submitted by SRI International and University of Basque Country for the Semantic Textual Similarity (STS) SemEval-2012 task. Our systems focused on using a simple set of features, featuring a mix of semantic similarity resources, lexical match heuristics, and part of speech (POS) information. We also incorporate precision focused scores over lexical and POS information derived from the BLEU measure, and lexical and POS features computed over split-bigrams from the ROUGE-S measure. These were used to train support vector regressors over the pairs in the training data. From the three systems we submitted, two performed well in the overall ranking, with split-bigrams improving performance over pairs drawn from the MSR Research Video Description Corpus. Our third system maintained three separate regressors, each trained specifically for the STS dataset they were drawn from. It used a multinomial classifier to predict which dataset regressor would be most appropriate to score a given pair, and used it to score that pair. This system underperformed, primarily due to errors in the dataset predictor.

FBK: Combining Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity

José Guilherme Camargo de Souza, Matteo Negri and Yashar Mehdad

(Submission #164)


This paper describes the participation of FBK in the Semantic Textual Similarity (STS) task organized within Semeval 2012. Our approach explores lexical, syntactical and semantic machine translation evaluation metrics combined with distributional and knowledge-based word similarity metrics. Our best model achieves 60.77% correlation with human judgements when evaluating the semantic similarity of texts over all datasets (Mean score) and was 20 out of 88 submitted runs in the Mean ranking, where the average correlation across all the sub-portions of the test set is considered.

FCC: Three Approaches for Semantic Textual Similarity

Maya Carrillo, Darnes Vilariño, David Pinto, Mireya Tovar, Saul León and Esteban Castillo

(Submission #165)


In this paper, we describe the three approaches we submitted to the Semantic Textual Similarity task of SemEval 2012. The first approach considers to calculate the semantic similarity by using the Jaccard coefficient with term expansion using synonyms. The second approach uses the semantic similarity reported by Mihalcea in (Mihalcea et al., 2006). The third approach employs Random Indexing and Bag of Concepts based on context vectors. We consider that the first and third approaches obtained a comparable performance, meanwhile the second approach got a very poor behavior. The best ALL result was obtained with the third approach, with a Pearson correlation equal to 0.663.

UNT: A Supervised Synergistic Approach to Semantic Text Similarity

Carmen Banea, Samer Hassan, Michael Mohler and Rada Mihalcea

(Submission #170)


This paper presents the systems our team participated with in the Semantic Text Similarity SemEval 2012 task. Based on prior research in semantic similarity and relatedness, we combine various methods in a machine learning framework. The three variations submitted during the task evaluation period ranked number 5, 9 and 14 among the 89 participating systems. Our evaluations show that corpus-based methods display a more robust behavior on the training data, yet combining a variety of methods allows a learning algorithm to achieve a superior decision than that achievable by any of the individual parts.

DERI&OEG: Pushing Corpus Based Relatedness to Similarity: Shared Task System Description

Nitish Aggarwal, Kartik Asooja and Paul Buitelaar

(Submission #171)


In this paper, we describe our system submitted for the semantic textual similarity (STS) task at SemEval 2012. We implemented two approaches. In the first, we combined corpus based relatedness with the linguistic syntax based semantic similarity using machine learning approaches. We implemented Explicit Semantic Analysis (ESA) as the corpus based semantic measure. For the semantic similarity between the syntactic roles, we modified Lin similarity based on WordNet in order to reflect better similarity measure for the sentences. For the second, we used Word- Net based word similarity along with a bipartite method to find the similarity between the full sentences.

Stanford: Probabilistic Edit Distance Metrics for STS

Mengqiu Wang and Daniel Cer

(Submission #173)


This paper describes Stanford University's submission to SemEval 2012 Semantic Textual Similarity (STS) shared evaluation task. Our proposed metric computes probabilistic edit distance as predictions of semantic textual similarity. We learn weighted edit distance in a probabilistic finite state machine (pFSM) model, where state transitions correspond to edit operations. While standard edit distance models cannot capture long-distance word swapping or cross alignments, we rectify these shortcomings using a novel pushdown automaton extension of the pFSM model. Our models are trained in a regression framework, and can easily incorporate a rich set of linguistic features. The performance of our edit distance based models is contrasted with an adaptation of the Stanford textual entailment system to the STS task. Our results show that the most advanced edit distance model, pPDA, outperforms our entailment system on all but one of the genres included in the STS task.

aca08ls: Two Approaches to Semantic Text Similarity

Sam Biggins, Shaabi Mohammed, Sam Oakley, Luke Stringer, Mark Stevenson and Judita Preiss

(Submission #174)


This paper describes the University of Sheffield's submission to SemEval-2012 Task 6: Semantic Text Similarity. Two approaches were developed. The first is an unsupervised technique based on the widely used vector space model and information from WordNet. The second method relies on supervised machine learning and represents each sentence as a set of n-grams. This approach also makes use of information from WordNet. Results from the formal evaluation show that both approaches are useful for determining the similarity in meaning between pairs of sentences with the best performance being obtained by the supervised approach. Incorporating information from WordNet also improves performance for both approaches.

Improving Text Similarity Measures without Human Assessments

Enrique Amigó, Jesus Gimenez, Julio Gonzalo and Felisa Verdejo

(Submission #180)


This paper describes the participation of UNED NLP group in the Semantic Textual Similarity task in SEMEVAL 2012. Our contribution consists of an unsupervised method Heterogeneity Based Ranking (HBR) to combine similarity measures. Our runs focus on combining standard similarity measures for Machine Translation evaluation. The Pearson achieved is clearly improved by other proposals in the competition due to the limitation of these kind of measures in the context of this task. However, the combination of system output (similarity measures) that participated in the campaign produced three interesting results: (i)~Combining all measures without considering any kind of human assessments achieve a similar performance than the best peers, (ii)~combining the less reliable 40 measures in the evaluation campaign achieves also similar results, improving the measures in isolation and, (iii)~the correlation between peers and HBR predict with a 0.94 correlation the performance of measures according to human assessments.

janardhan: Semantic Textual Similarity using Universal Networking Language graph matching

Janardhan Singh, Arindam Bhattacharya and Pushpak Bhattacharyya

(Submission #182)


Sentences that are syntactically quite different can often have similar or same meaning. The SemEval 2012 task of Semantic Textual Similarity aims at finding the semantic similarity between two sentences. The semantic representation of Universal Networking Language (UNL), represents only the inherent meaning in a sentence without any syntactic details. Thus, comparing the UNL graphs of two sentences can give an insight into how semantically similar the two sentences are. This paper presents the UNL graph matching method for the Semantic Textual Similarity(STS) task.

Sagan: An approach to Semantic Text Similarity based on Textual Entailment

Julio Castillo and Paula Estrella

(Submission #183)


In this paper we report the results obtained in the Semantic Text Similarity (STS) task, with a system primarily developed for tex-tual entailment. Our results are quite prom-ising, getting a run ranked 39 in the official results with overall Pearson, and ranking 29 with the Mean metric.

UOW: Improving of Semantic Similarity Metrics

Miguel Rios

(Submission #184)


We describe our submission to the Semantic Textual Similarity task. Our submission consists in a Machine Learning algorithm trained to predict the semantic equivalence between sentences. The classes of features used to train are: i) lexical metrics, ii) syntactical metric and iii) semantic metrics. The lexical features are word-overlap based. The syntactical metric is our modification to the BLEU algorithm to use chunks instead of words. The semantic metrics are: Named Entities metric which compares the same type of entity but allows similar entities, and our modification to a Semantic Roles Labels metric based on align predicates by using the arguments as context and the similarity score from a thesaurus. Moreover, we use the METEOR metric in order to measure the impact of adding our semantic metrics into a well known metric. Our submissions outperform the official baseline but it does not improve the model with METEOR.

Penn: Using Word Similarities to better Estimate Sentence Similarity

Sneha Jha, Hansen A. Schwartz and Lyle Ungar

(Submission #185)


We present the Penn system for SemEval-2012 Task 6,computing the degree of semantic equivalence between two sentences. We explore the contributions of different vector models for computing sentence and word similarity: Collobert and Weston embeddings, eigenwords and selectors. These embeddings provide different measures of distributional similarity between words, and their contexts. We used regression to combine the different similarity measures, and found that each provides partially independent predictive signal above baseline models. We compute the semantic similarity between pairs of sentences by combining a set of similarity metrics at various levels of depth, from surface word similarity to similarities derived from vector models of word or sentence meaning. Regression is then used to determine optimal weightings of the different similarity measures. We use this setting to assess the contributions from several different word embeddings.Our system is based on similarities computed using multiple sets of features: (a) naive lexical features, (b) similarity between vector representations of sentences, and (c) similarity between constituent words computed using WordNet, using the eigenword vector representations of words, as computed using spectral methods based on word contexts and using selectors, which generalize words to a set of words that appear in the same context.

Soft Cardinality: A Parameterized Similarity Function for Text Comparison

Sergio Jimenez, Claudia Becerra and Alexander Gelbukh

(Submission #186)


Abstract We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model varying levels of granularity from sentences to characters. Therefore, our model was used for comparison of sentences divided into words, and in turn, words divided into q-grams of characters. Experimentally, we observed that a performance correlation function in a space defined by all parameters was relatively smooth and had a single maximum achievable by “hill climbing”. Our approach used only surface text information, a stop-word remover and a stemmer to tackle the semantic text similarity task 6 at SEMEVAL 2012. The proposed method ranked 3rd (average), 5th (normalized correlation) and 15th (aggregated correlation) among 89 systems submitted by 31 teams.


UTDHLT: COPACETIC System for Choosing Plausible Alternatives

Travis Goodwin, Bryan Rink, Kirk Roberts and Sanda Harabagiu

(Submission #163)


The Choice of Plausible Alternatives (COPA) task in SemEval-2012 presents a series of forced-choice questions wherein each question provides a premise and two viable cause or effect scenarios. The correct answer is the cause or effect that is the most plausible. This paper describes the COPACETIC system developed by the University of Texas at Dallas (UTD) for this task. We approach this task by casting it as a classification problem and using features derived from from bigram co-occurrences, TimeML temporal links between events, single-word polarities from the Harvard General Inquirer, and causal syntactic dependency structures within the gigaword corpus. Additionally, we show that although each of these components improves our score for this evaluation, the difference in accuracy between using all of these features and using bigram co-occurrence information alone is not statistically significant.


UAlacant: Using Online Machine Translation for Cross-Lingual Textual Entailment

Miquel Esplà-Gomis, Felipe Sánchez-Martínez and Mikel L. Forcada

(Submission #115)


This paper describes a new method for cross-lingual textual entailment (CLTE) detection based on machine translation (MT). We use sub-segment translations from different MT systems available online as a source of cross-lingual knowledge. In this work we describe and evaluate different features derived from these sub-segment translations, which are used by a support vector machine classifier in order to detect CLTEs. We presented this system to the SemEval 2012 task 8 obtaining an accuracy up to 59.8% on the test set, the second best performing approach in the contest.

HDU: Cross-lingual Textual Entailment with SMT Features

Katharina Wäschle and Sascha Fendrich

(Submission #117)


We describe the Heidelberg University system for the Cross-lingual Textual Entailment (CLTE) task at SemEval-2012. The system relies on standard statistical machine translation (SMT) methods and tools, combining different features based on monolingual and cross-lingual word alignments as well as classic TE distance and bag-of-word features in a statistical learning framework. We learn separate binary classifiers for each entailment direction and combine them to obtain four entailment relations. Our system yielded the best overall score for three out of four language pairs.

Soft Cardinality + ML: Learning Adaptive Similarity Functions for Cross-lingual Textual Entailment

Sergio Jimenez, Claudia Becerra and Alexander Gelbukh

(Submission #120)


This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets with the cardinalities of the commonalities and differences between pairs of objects being compared. This approach allows the machine-learning algorithm to obtain an asymmetric similarity function suitable for directional judgments. Besides using the classic set cardinality, we used soft cardinality to allow flexibility in the comparison between words. Our approach used only the information from the surface of the text, a stop-word remover and a stemmer to address the cross-lingual textual-entailment task 8 at SEMEVAL 2012. We have the third best result among the 29 systems submitted by 10 teams. Additionally, this paper presents better results compared with the best official score.

JU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System

Snehasis Neogi, Partha Pakray, Sivaji Bandyopadhyay and Alexander Gelbukh

(Submission #132)


This article presents the experiments car-ried out at Jadavpur University as part of the participation in Cross-lingual Textual Entailment for Content Synchronization (CLTE) of task 8 @ Semantic Evaluation Exercises (SemEval-2012). This paper explores cross-lingual textual entailment as a relation between two texts in different languages and proposes different measures for entailment decision in a four way classification tasks (forward, backward, bidirectional and no-entailment). We set up different heuristics and measures syntactic relations between two texts. Our system is mainly considering the syntactic mapping between the texts. It measures the proximity of text pair through this mapping. It considers Named Entity, Noun Chunks, Part of speech, N-Gram and some text similarity of the pair to decide the entailment clause. Rules established to encounter the multi way entailment issue. Our system ensures the entailment after make a comparative measure of the scores output from the modules. Four different rules are predeter-mined for four different classes of entailment.

CELI: An Experiment with Cross Language Textual Entailment

Milen Kouylekov

(Submission #142)


This paper presents CELI's participation in the SemEval Cross-lingual Textual Entailment for Content Synchronization task.

In our participation in the 2012 SemEval Cross-lingual Textual Entailment for Content Synchronization task we have developed an approach based on cross-language text similarity. We have modified our cross-language query similarity system to handle longer texts.

FBK: Cross-Lingual Textual Entailment Without Translation

Yashar Mehdad, Matteo Negri and Jose Guilherme C. de Souza

(Submission #148)


This paper overviews FBK’s participation in the Cross-Lingual Textual Entailment for Content Synchronization task organized within the SemEval 2012. Our participation is characterized by using cross-lingual matching features extracted from lexical and semantic phrase tables and dependency relations. The features are used for multi-class and binary classification using SVMs. Using a combination of lexical, syntactic, and semantic features to create a cross-lingual textual entailment system, we report on experiments over the provided dataset. Besides demonstrating the effectiveness of the approach, our best run achieved an accuracy of 50.4% (with the average score and the median system respectively achieving 44.0% and 40.7%).

BUAP: Lexical and Semantic Similarity for Cross-lingual Textual Entailment

Darnes Vilariño, David Pinto, Mireya Tovar, Saul León and Esteban Castillo

(Submission #152)


In this paper we present a report of the two different runs submitted to the task 8 of Semeval 2012 for the evaluation of Cross-lingual Textual Entailment in the framework of Content Synchronization. Both approaches are based on textual similarity, and the entailment judgment (bidirectional, forward, backward or no entailment) is given based on a set of decision rules. The first approach uses textual similarity on the translated and original versions of the texts, whereas the second approach expands the terms by means of synonyms. The evaluation of both approaches show a similar behavior which still close to the average and median.

DirRelCond3: Detecting Textual Entailment Across Languages With Conditions On Directional Text Relatedness Scores

Alpar Perini

(Submission #158)


There are relatively few entailment heuristics that exploit the directional nature of the entailment relation. Cross-Lingual Text Entailment (CLTE), besides introducing the extra dimension of cross-linguality, also requires to determine the exact direction of the entailment relation, to provide content synchronization (Negri et al., 2012). Our system uses simple dictionary lookup combined with heuristic conditions to determine the possible directions of entailment between the two texts written in different languages. The key members of the conditions were derived from (Corley and Mihalcea, 2005) formula initially for text similarity, while the entailment condition used as a starting point was that from (Tatar et al., 2009). We show the results obtained by our implementation of this simple and fast approach at the CLTE task from the SemEval-2012 challenge.

ICT: A Translation based Method for Cross-lingual Textual Entailment

Fandong Meng, Hao Xiong and Qun Liu

(Submission #178)


In this paper, we present our system descrip-tion in task of Cross-lingual Textual Entail-ment. The goal of this task is to induce entailment relations between two sentences written in different languages. To accomplish this goal, we first translate sentences written in foreign languages into English. Then, we use EDITS1, an open source package, to recognize entailment relations. Since EDITS only draws monodirectional relations while the task requires bidirectional prediction, thus we ex-change the hypothesis and test to induce en-tailment in another direction. Experimental results show that our method achieves promis-ing results but not perfect results compared to other participants.

Sagan: An approach to Cross-Lingual Textual Entailment based on Machine Translation

Julio Castillo and Marina Cardenas

(Submission #181)


This paper describes our participation in the task denominated Cross-Lingual Textual En-tailment(CLTE) for content synchronization. We represent an approach to CLTE using machine translation to tackle the problem of multilinguality. Our system resides on ma-chine learning and in the use of WordNet as semantic source knowledge. Results are very promising always achieving results above mean score.