Home

WSI

Word Sense Induction

Introduction

Word Sense Induction (WSI) is the task of identifying the different senses (uses) of a target word in a given text. Traditional graph-based approaches create and then cluster a graph, in which each vertex corresponds to a word that co-occurs with the target word, and edges between vertices are weighted based on the co-occurrence frequency of their associated words.

Description

Using word senses instead of word forms is essential in many applications such as information retrieval (IR) and machine translation (MT). Word senses are a prerequisite for word sense disambiguation (WSD) algorithms. However, they are usually represented as a fixed-list of definitions of a manually constructed lexical database. There are several disadvantages associated with the fixedlist of senses paradigm. Firstly, lexical databases often contain general definitions and miss many domain specific senses [1]. Secondly, they suffer from the lack of explicit semantic and topical relations between concepts. Thirdly, they often do not reflect the exact content of the context, in which the target word appears. WSI aims to overcome these limitations of hand-constructed lexicons. Most of the work in WSI is based on the vector-space model, where each context of a target word is represented as a vector of features (e.g. frequency of co-occurring words). Context vectors are clustered and the resulting clusters are taken to represent the induced senses. Recently, graph-based methods have been employed to WSI. Typically, graph-based approaches represent each word wi co-occurring with the target word tw, within a pre-specified window, as a vertex. Two vertices are connected via an edge if they co-occur in one or more contexts of tw. Once the co-occurrence graph of tw has been constructed, different graph clustering algorithms are applied to induce the senses. Each cluster (induced sense) consists of a set of words that are semantically related to the particular sense.

Publications

  1. Ioannis P. Klapaftis & Suresh Manandhar, "Word Sense Induction Using Graphs of Collocations." In Proceedings of the 18th European Conference on Artificial Intelligence, (ECAI-2008), Patras, Greece, 2008. (Acc. rate:21.5%)
  2. Ioannis P. Klapaftis & Suresh Manandhar, "UOY: A Hypergraph Model for Word Sense Induction and Disambiguation." In Proceedings of SemEval-2007, Association for Computational Linguistics, Prague, Czech Republic, 2007.
  3. Ioannis P. Klapaftis & Suresh Manandhar,"Unsupervised Word Sense Disambiguation Using the WWW." Proceedings of the ECAI Starting AI Researcher Symposium - STAIRS, Riva del Garda, Italy, 2006.
  4. Ioannis P. Klapaftis & Suresh Manandhar,"Term Sense Disambiguation for Ontology Learning. Proceedings of the 6th International Conference on Intelligent Systems Design and Applications, Volume 02, Pages 844-849. IEEE Computer Society, Washington,DC, 2006.
  5. Ioannis P. Klapaftis & Suresh Manandhar, "Google & Wordnet based Word Sense Disambiguation." Proceedings of the 22nd International Conference on Machine Learning (ICML05) Workshop on Learning and Extending Ontologies by using Machine Learning Methods, Bonn, Germany, 2005.