Task Description

Organizers: Lucia Specia, Sujay Jauhar and Rada Mihalcea

The goal of this task is to provide a framework for the evaluation of systems for lexical simplification. Given a short input text and a target word in English, and given several English substitutes for the target  word that fit the context, the goal is to rank these substitutes according to how "simple" they are. "Simple words" are loosely defined  as  words that can be understood by a wide variety of people, including for example people with low literacy levels or some cognitive disability, children, and non-native speakers of English. In particular, the trial and gold-standard data provided as part of the task are annotated by fluent but non-native speakers of English.

This task is closely related to the English Lexical Substitution task from SemEval-2007 (McCarthy and Navigli, 2007). However, instead of asking the participating systems to provide the lexical substitutes, we give the substitutes as part of the task and we ask the participants to rank them.

With such a task, we hope to encourage the development of better systems for text simplification. Text simplification systems involve modifications of an original (complex) text at different linguistic levels, particularly at the syntactic and lexical levels. While a significant amount of work has been done on syntactic simplification, lexical simplification has been often addressed by simple lookup in thesauruses or databases containing frequency information, such as the one developed by Devlin and Tait (1998). In such approaches, the context of the complex target word is disregarded. An exception are the works by De Belder et al., (2010), in which word sense disambiguation is performed to choose among a set of possible simplifications, and Yatskar et al. (2010), which uses an unsupervised learning method and metadata to find simplifications using Simple English Wikipedia.


The trial and test data will follow the format of the English Lexical Substitution task from SemEval-2007. In particular, the trial data will draw upon the trial dataset used for the English Lexical Substitution task. A new test dataset will be constructed and provided. Please refer to the description of that task  for more information about the format of the dataset.

Trial data:

The trial dataset contains 300 development examples covering 30 target words in context. Annotations for the trial dataset were obtained  using 5 annotators. Given its reduced size, this dataset is not intended for training purposes (although participants can use it for this purpose if they choose to do so), but rather so that the participants can familiarise with the data format and the evaluation metrics. 

Test data:

The test dataset contains 1710 test examples covering a number of target words in context (all different from the words in the trial set). The annotation  follows a similar methodology as used for the trial data. The gold-standard annotation for all test cases is also packed in this distribution.


The evaluation methodology is based on pairwise rankings. A  scoring system is available.


Three simple baselines were computed for the trial dataset, including random ranking and ranking by the frequency of the substitutes in a large corpus.


Here is the excel sheet with the results of the task.


