Lemmatization

Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.[1]

In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighbouring sentences or even an entire document. As a result, developing efficient lemmatization algorithms is an open area of research.[2][3][4]

  1. ^ Collins English Dictionary, entry for "lemmatize"
  2. ^ "WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages".
  3. ^ Müller, Thomas; Cotterell, Ryan; Fraser, Alexander; Schütze, Hinrich (2015). Joint Lemmatization and Morphological Tagging with LEMMING (PDF). 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics. pp. 2268–2274. doi:10.18653/v1/D15-1272.
  4. ^ Bergmanis, Toms; Goldwater, Sharon. "Context Sensitive Neural Lemmatization with Lematus" (PDF).