Lemmatization

Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.^[1]

In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighbouring sentences or even an entire document. As a result, developing efficient lemmatization algorithms is an open area of research.^[2]^[3]^[4]

^ Collins English Dictionary, entry for "lemmatize"
^ "WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages".
^ Müller, Thomas; Cotterell, Ryan; Fraser, Alexander; Schütze, Hinrich (2015). Joint Lemmatization and Morphological Tagging with LEMMING (PDF). 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics. pp. 2268–2274. doi:10.18653/v1/D15-1272.
^ Bergmanis, Toms; Goldwater, Sharon. "Context Sensitive Neural Lemmatization with Lematus" (PDF).

[1] Collins English Dictionary, entry for "lemmatize"

[Semantic_Annotation_Research-2] "WebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages".

[Muller,_University_of_Munich-3] Müller, Thomas; Cotterell, Ryan; Fraser, Alexander; Schütze, Hinrich (2015). Joint Lemmatization and Morphological Tagging with LEMMING (PDF). 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics. pp. 2268–2274. doi:10.18653/v1/D15-1272.

[4] Bergmanis, Toms; Goldwater, Sharon. "Context Sensitive Neural Lemmatization with Lematus" (PDF).

[1]

[2]

[3]

[4]