SemEval-2012 Task #8
Registration and mailing list



The Cross-Lingual Textual Entailment task (CLTE) addresses textual entailment (TE) recognition under a new dimension (cross-linguality), and within a challenging application scenario (content synchronization).

Cross-linguality represents a dimension of the TE recognition problem that so far has been only partially investigated. The great potential of integrating monolingual TE recognition components into NLP architectures has been reported in several areas, including question answering, information retrieval, information extraction, and document summarization. However, mainly due to the absence of CLTE recognition components, similar improvements have not been achieved yet in any cross-lingual application. The CLTE task aims at prompting research to fill this gap.

Content synchronization represents an ideal application scenario to test the capabilities of advanced NLP systems. Given two documents about the same topic written in different languages (e.g. Wikipedia articles), the task consists of automatically detecting and resolving differences in the information they provide, in order to produce aligned, mutually enriched versions of the two documents. Towards this objective, a crucial requirement is to identify the information in one page that is equivalent or novel (more informative) with respect to the content of the other. The task can be naturally cast as an entailment-related problem, where bidirectional and unidirectional entailment judgments for two text fragments are respectively mapped into judgments about semantic equivalence and novelty. Alternatively, the task can be seen as a Machine Translation problem, where judgments about semantic equivalence and novelty depend on the possibility to fully or partially translate a text fragment into the other.


Given a pair of topically related text fragments (T1 and T2) in different languages, the CLTE task consists of automatically annotating it with one of the following entailment judgments:

- Bidirectional (T1 ->T2 & T1 <- T2): the two fragments entail each other (semantic equivalence)
- Forward (T1 -> T2 & T1 !<- T2): unidirectional entailment from T1 to T2
- Backward (T1 !-> T2 & T1 <- T2): unidirectional entailment from T2 to T1
- No Entailment (T1 !-> T2 & T1 !<- T2): there is no entailment between T1 and T2

In this task, both T1 and T2 are assumed to be TRUE statements; hence in the dataset there are no contradictory pairs.


<entailment-corpus  languages="spa-eng">
          <pair id=“1” entailment=“bidirectional”>
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
                    <t2>Mozart was born in Salzburg.</t2>
          <pair id=“2” entailment="forward”>
                   <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
                    <t2> Mozart was born in 1756 in the city of Salzburg.</t2>
          <pair id=“3” entailment="backward”>
                    <t1>Mozart nació en la ciudad de Salzburgo</t1>
                    <t2>Mozart was born on 27th January 1756 in Salzburg.</t2>                   

          <pair id=“4” entailment="no_entailment”>
                    <t1>Mozart nació el 27 de enero de 1756 en Salzburgo</t1>
                    <t2>Mozart was born to Leopold and Anna Maria Pertl Mozart.</t2>


The dataset consists of 1,000 CLTE pairs (500 for training and 500 for test), balanced with respect to the four entailment judgments (bidirectional, forward, backward, and no entailment).

Cross-lingual datasets are available for the following language combinations:

- Spanish/English (spa-eng)
- German/English (deu-eng)
- Italian/English (ita-eng)
- French/English (fra-eng)

The dataset was created following the crowdsourcing-based methodology proposed in (Negri et al., EMNLP 2011), which consisted of the following steps:

  • English sentences were selected from copyright-free sources, i.e. Wikipedia and Wikinews, and represent T1 in the entailment pair;
  • each T1 was modified through crowdsourcing in various ways (e.g. introducing lexical and syntactic changes, adding and removing portions of text, etc.) in order to obtain a corresponding T2;
  • each T1 was paired to the corresponding T2, and the resulting pairs were annotated with the entailment judgment. The final result was a monolingual English dataset;
  • in order to create the cross-lingual datasets, each English T1 was translated into four different languages (i.e. Spanish, German, Italian and French);
  • by pairing the translated T1 with the corresponding T2 in English, four cross-lingual datasets were obtained;
  • the overall final result is a multilingual parallel entailment corpus, where T1’s are in 5 different languages (i.e. English, Spanish, German, Italian, and French), and T2’s are in English.
  • to ensure the quality of the dataset, all the pairs were manually checked by two expert annotators and modified where necessary.

For each language combination, the Training Set has the XML format shown in the example above, where:

- each language combination is specified in the "languages" attribute contained in the root element <entailment-corpus>
- each entailment pair appears within a single <pair> element
-  the element <pair> has the following attributes:
           -- "id", a unique numeral identifier of the pair
           -- "entailment", the entailment annotation, being either bidirectional, forward, backward, and no entailment
- the elements <t1> and <t2> contain the text fragments composing the entailment pair

The Test set will be the same as the Training Set except for the "entailment" attribute which will be left empty.

In order to obtain the CLTE dataset please express your interest in participating in the task by:
- writing an email to the task organizers (,  AND
- joining the CLTE discussion group at


Participants can submit up to five runs for any of the proposed language combinations.

Results are to be submitted as one file per run. The run filename must contain the team name, the language combination, and the run number (e.g “TeamName_spa-eng_run1.xml”).

No partial submissions are allowed, i.e. the submission must cover the whole dataset.

The submission format is the same as the Test Set, except the fact that:
- only ID’s are required
- the "entailment" attribute must be filled with the corresponding entailment judgment.


<entailment-corpus  languages="spa-eng">
          <pair id=“1” entailment=“bidirectional”/>
          <pair id=“2” entailment="forward”/>
          <pair id=“3” entailment="backward”/>
          <pair id=“4” entailment="no_entailment”/>


The evaluation of all submitted runs will be automatic. Judgments returned by the system will be compared to those manually assigned by human annotators (the Gold Standard).

The official metric used to evaluate system performances will be accuracy over the whole Test Set, i.e. the number of correct judgments out of the total number of judgments in the Test Set.

Additionally, Precision, Recall, and F1 measures will be provided for each of the four entailment judgment categories taken separately.


* September 1, 2011: Trial Dataset released (40 English/Spanish pairs)
* December 16, 2011: Training data + test scripts release•March 15, 2012: Test data release
* March 15, 2012: Test data release
* March 23, 2012: Task submissions deadline
* April 1, 2012: Release of individual results
* April 11, 2012: Systems' reports due to organizers
* April 23, 2012: Papers' review due
* May 4, 2012: Camera Ready deadline
* June 7-8, 2012: Workshop collocated with NAACL-HLT Montreal, Canada


Matteo Negri (negrI [at], FBK irst, Trento, Italy
Yashar Mehdad (mehdad [at], FBK irst, Trento, Italy
Luisa Bentivogli (bentivo [at], FBK irst, Trento, Italy
Danilo Giampiccolo (giampiccolo [at], CELCT, Italy
Alessandro Marchetti (amarchetti [at], CELCT, Italy


- CLTE task website:
- CLTE discussion group:
- SemEval-2012 website:
- SemEval-2012 discussion group: