the Reading Room

 

You are at :   Computational Linguistics > Word Sense Disambiguation > Background

A Timeline of research in Word Sense Disambiguation
   (extracted from Source Text: Ide and Veronis, (1998) “Word Sense  Disambiguation: The State of the Art”)

1. Early Machine Translation (1950's)

2. AI Methods (1960's - 70's)

3. Knowledge-Based Methods (1980's)

4. Corpus-Based Methods (1990's-2000's)


1. Early Machine Translation

A) Approaches:

Much of the foundation of WSD was laid in this period, but without large resources, ideas were untested.

  • 1949, Weaver: window size of N words in text before and after usage (of the word being disambiguated)
    • (in Memorandum) realised relationship between domain specificity and reduced word sense ambiguity
      • resulted in work on “micro-glossaries”
  • 1950, Kaplan: experiments on value for N, where N=2, or N=sentence length.  Result: no difference
    • Similar Expts (same results):
      • 1956, Koutsoudas and Korfhage, on Russian
      • 1961, Masterman
      • 1961, Gougenheim and Michéa, on French
      • 1985, Choueka and Lusignan, on French
  • 1955, Reifler: “Semantic Coincidences”, relationship between syntax structure and word sense

B) Resources:

Knowledge Representation of words was realised:

  • 1957, Masterman, uses Roget’s Thesaurus to determine Latin-English translation based on most frequently referred to thesaurus categories in a Latin sentence.
    • This early statistical approach was continued by other researchers.

 

C) Studies of the problem:

Measurements of degree of polysemy

  • 1957, Harper:
    • on Physics texts, 30% polysemous
    • on Scientific texts, 43% polysemous
    • in Russian dictionary, on average 8.6:1 ratio of words from Eng:Russian
      • 5.6 are quasi-synonyms
      • ¼ polysemous in computerised dictionary

2.  AI Methods

Criticisms: Mostly all at level of sentence. All toy systems in that often tried to tackle highly ambiguous words with fine sense distinctions.  Often used in sentences that were unlikely to be found in real world.  Often relied on much hand-crafting and suffered from the “knowledge-acquisition bottleneck” (though many AI systems of the time suffered from this).

Symbolic Methods:

 Semantic Networks: (c.f. connectionist models: spreading activation models)

  • 1961, Masterman: Defined 100 primitive concepts by which to organise a dictionary. This resulted in a semantic network where nodes = concepts, and arcs = semantic relationships.
  • 1961-1969, Quillian, worked on semantic networks.  The path between two nodes (words), will usually only involve one sense of intermediary nodes (words).

 Networks and Frames:

  • 1976, Hayes: Case frames used with semantic networks (nodes=nouns, ie. Case frames, arcs=verbs).  Able to handle homonyms, but not other polysemy.
  • 1987, Hirst: Uses a network of frames with marker passing (ie. Quillian’s approach).  “Polaroid words” where inappropriate senses eliminated by syntactic evidence and semantic relations, resulting in one sense.  Suffers from metaphorical use of words, resulting in no senses remaining.

 Case-based Approaches:

  • 1968-75, Wilks: “Preference semantics” rules for selection restrictions based on semantic features (eg. Animate vs inanimate).
  • 1979, Boguraev: Preference semantics insufficient for polysemous verbs.  Attempts to enhance it with case frames.  Mixes syntactic and semantic evidence for sense assignment.

 Ontological Reasoning:

  • 1988, Dahlgren: Disambiguation handled in two approaches (each used 50% of time).  Fixed phrases and syntax OR reasoning (including common sense reasoning).  Reasoning involves finding ontological parents, precursor to Resnik (1993).

 Connectionist Methods:

 Spreading Activation Methods:

  • (1961, Quillian’s approach is precursor to spreading activation models.  However, Quillian’s approach was still symbolic.  Neural networks are numeric.)
  • 1971, Meyer and Schvaneveldt, “Semantic Priming”, we understand subsequent words based on what we’ve already heard.
  • 1975, Collins and Loftus – 1983, Anderson: “Spreading Activation” models.  Activation weakens as it spreads.  Multiple stimulations of a node means activation is reinforced.
  • 1981, McClelland and Rumelhart: add notion of inhibitory activation.  For example, nodes activating one word sense may inhibit competing word senses.
  • 1983, Cottrell and Small use neural networks for work similar to Quillian (a node is a concept).
  • 1985, Waltz and Pollack: semantic “micro-features”, like animate vs inanimate, are hand-coded in networks as context for disambiguation.
  • 1987, Bookman: automatic priming of microfeature context from preceding text.  Analogous to short-term memory.
  • 1988, Kawamoto: Distributed networds (ie. Node =/= concept).  But these require training (in contrast to “local models” which are defined a priori).

 

3. Knowledge-Based Methods:

This period of research arose due to the availability of machine-readable resources.  The Following divisions of “Dictionaries”, “Thesaurus” and “Lexicon” are based on the method of data organisation.  In a dictionary, the main entry is at the word level.  The entry refers to various senses of the word.  In a thesaurus, the main entry is for a cluster of related words.  In a lexicon, the main entry is for the word sense, which corresponds to various words.  This makes the lexicon and the thesaurus quite similar structurally.

Criticisms:

Inferences based machine-readable resources often suffer from three main problems.  The first is that it is hard to obtain non-contentious definitions for words.  That is, in general, it is difficult for humans to agree on the division of senses of a word.  Secondly, thesauri and lexicons often organise concepts hierarchically.  However, the exact hierarchical organisation is also often debated.  Finally, the path length between nodes of a lexicon or thesaurus does not mean anything. 

Dictionaries (accuracy approx 70%)

  • 1980, Amsler and 1982, Michiel theses: used machine-readable dictionaries.
  • 1986, Lesk: tried to build knowledge base from dictionary.  Each word sense corresponded to a “signature”.  Signatures consisted of the bag of words used in the definition of the word sense.  This bag of words was compared to the context of the target word.  This approach was the precursur to future statistical work but was too dependent on the particular dictionary’s definitions.
  • 1990, Wilks: built a measure of relatedness between two words by comparing co-occurences between definitions for those words.  This metric is used to compare the target word with words in its surrounding context window.
  • 1990, Veronis and Ide: Built a symbolic network out of words and word senses.  A node was a word or a word sense.  Words are connected to their word senses which are in turn connected to the words in their signatures.
  • 1989-1993, misc et al.: tried to use the extra fields of the LDOCE such as subject codes and box codes.  Box codes represented semantic primitives.

 Thesauri

  • 1957, Masterman: uses Roget’s Thesaurus for machine translation
  • 1985, Patrick: uses Roget’s for verb sense disambiguation.  He examines the connectivity between the closest synonyms and the target word.  These senses distinctions are narrow as they are based on “strongest” synonyms.
  • 1992, Yarowsky: using Roget’s as basis of word classes (or senses) uses the Groliers Encyclopedia to find signatures, a bag a words most likely to occur for that word class.  His sense distinctions are quite broad (3 way).

 Lexicons

·        1990, Miller et al: WordNet, a hand-crafted lexicon, enumerated

·        1990, Lenat and Guha: CyC, a semi-hand-crafted lexicon (in principle), enumerated

·        1991, Briscoe, ACQUILEX,, enumerated

·        1994, Grishman et al.: COMLEX, enumerated.

·        1995, Buitelaar, CORELEX, generative lexicon

  • 1993, Sussna: uses path lengths to measure relatedness between two words.  An overall relatedness score is computed between the sense and the context words (or their senses).  For competing senses, the one with the highest relatedness score is the disambiguated sense.
  • 1985, Resnik: uses information content of words (based on corpus frequencies) and wordnet ontology to measure relatedness between two words

4. Corpus-Based Methods:

In recent years, large corpora of text have become available (see Descriptions of Selected Corpora) on which one can apply empirical NLP methods.

Problems:

In general, empirical methods are affected by the Data Sparseness problem.  In the word-sense disambiguation research area, there is the additional problem of manual tagging of word senses which is expensive.

  • 1991, Hearst: the "CatchWord" algorithm uses a training phase which requires a bootstrapping set of manually tagged senses to train from.  After training, disambiguation results over a certain threshhold are treated as handtagged, and used as further evidence for the disambiguation of the word in question.

  • 1992, 1993, Schültze: Uses letter fourgrams in a 1001 character window size to find clusters of word occurences.  These clusters correspond to sense differences by eacy cluster has to be manually labelled.  The sense distinction made using clusters is often very fine, as a sense may be represented by a collection of clusters.

  • 1991, Brown et al., 1992-93, Gale et al: use parallel aligned corpora to find manual translations of ambiguous words.  The translations may specify the sense distinction since the translation in the target language may not share the same ambiguity.  From this, tagging can be done automatically on the source language.  This is problematic in that a target language may actually share the ambiguity and the senses may be skewed according to the domain of the corpus used.

Manual Sense-tagging Efforts

    (Note that these are typically subsets of corpora. The tagged portions are thus much smaller than what is needed for typical statistical approaches.)

  • 1993, Miller et al.: hand tagged occurences of 1000 selected words from a subset of the Brown corpus
  • 1996, Ng and Lee: hand tagged occurrences of 191 selected words in a subset of the Brown and Wall Street Journal corpora (~ 200,000 sentences) with WordNet synset senses.
  • 1997, Wiebe et al.: hand tagged occurences of 25 selected verbs in a subset of the Wall Street Journal (~ 12, 925 sentences)

 

The Reading Room   |   Email me    |   My Personal Website

Copyright: Stephen Wan 2005
Please read the General Information found on the Reading Room home page regarding usage of this resource website.