Created
August 5, 2012 02:09
-
-
Save Badgerati/3261142 to your computer and use it in GitHub Desktop.
An implementation of the Levenshtein distance algorithm in LUA.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Returns the Levenshtein distance between the two given strings | |
function string.levenshtein(str1, str2) | |
local len1 = string.len(str1) | |
local len2 = string.len(str2) | |
local matrix = {} | |
local cost = 0 | |
-- quick cut-offs to save time | |
if (len1 == 0) then | |
return len2 | |
elseif (len2 == 0) then | |
return len1 | |
elseif (str1 == str2) then | |
return 0 | |
end | |
-- initialise the base matrix values | |
for i = 0, len1, 1 do | |
matrix[i] = {} | |
matrix[i][0] = i | |
end | |
for j = 0, len2, 1 do | |
matrix[0][j] = j | |
end | |
-- actual Levenshtein algorithm | |
for i = 1, len1, 1 do | |
for j = 1, len2, 1 do | |
if (str1:byte(i) == str2:byte(j)) then | |
cost = 0 | |
else | |
cost = 1 | |
end | |
matrix[i][j] = math.min(matrix[i-1][j] + 1, matrix[i][j-1] + 1, matrix[i-1][j-1] + cost) | |
end | |
end | |
-- return the last value - this is the Levenshtein distance | |
return matrix[len1][len2] | |
end |
Oustanding!
This is great, thanks. :)
who came here from that one devforum
quick note: because in line 35 you only ever refer to matrix[i] and matrix[i-1], you can get away with just storing the two most recent lines of the matrix, rather than computing the whole thing. This gets you down to O(2n) space (while still taking O(n^2) time)
You can use a modified version of this algorithm to search for substrings that closely match another string.
Thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for this!