Levenshtein distance

Levenshtein distance
Edit distance matrix for two words using cost of substitution as 1 and cost of deletion or insertion as 0.5
Classmeasuring the difference between two sequences

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after Soviet mathematician Vladimir Levenshtein, who defined the metric in 1965.[1]

Levenshtein distance may also be referred to as edit distance, although that term may also denote a larger family of distance metrics known collectively as edit distance.[2]: 32  It is closely related to pairwise string alignments.

  1. ^ В. И. Левенштейн (1965). Двоичные коды с исправлением выпадений, вставок и замещений символов [Binary codes capable of correcting deletions, insertions, and reversals]. Доклады Академии Наук СССР (in Russian). 163 (4): 845–848. Appeared in English as: Levenshtein, Vladimir I. (February 1966). "Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady. 10 (8): 707–710. Bibcode:1966SPhD...10..707L.
  2. ^ Navarro, Gonzalo (2001). "A guided tour to approximate string matching" (PDF). ACM Computing Surveys. 33 (1): 31–88. CiteSeerX 10.1.1.452.6317. doi:10.1145/375360.375365. S2CID 207551224.