Description
Using word senses instead of word forms is essential in many applications such as information retrieval (IR) and machine translation (MT). Word senses are a prerequisite for word sense disambiguation (WSD) algorithms. However, they are usually represented as a fixed-list of definitions of a manually constructed lexical database. There are several disadvantages associated with the fixedlist of senses paradigm. Firstly, lexical databases often contain general definitions and miss many domain specific senses [1]. Secondly, they suffer from the lack of explicit semantic and topical relations between concepts. Thirdly, they often do not reflect the exact content of the context, in which the target word appears. WSI aims to overcome these limitations of hand-constructed lexicons. Most of the work in WSI is based on the vector-space model, where each context of a target word is represented as a vector of features (e.g. frequency of co-occurring words). Context vectors are clustered and the resulting clusters are taken to represent the induced senses. Recently, graph-based methods have been employed to WSI. Typically, graph-based approaches represent each word wi co-occurring with the target word tw, within a pre-specified window, as a vertex. Two vertices are connected via an edge if they co-occur in one or more contexts of tw. Once the co-occurrence graph of tw has been constructed, different graph clustering algorithms are applied to induce the senses. Each cluster (induced sense) consists of a set of words that are semantically related to the particular sense.
