Skip to content

Instantly share code, notes, and snippets.

@sunlightlabs
Created August 19, 2009 18:07
Show Gist options
  • Save sunlightlabs/170531 to your computer and use it in GitHub Desktop.
Save sunlightlabs/170531 to your computer and use it in GitHub Desktop.
Ruby C implementation of Levenshtein
# C implementation of Levenshtein text distance algorithm, uses RubyInline to call from within Ruby
# Wildly faster than the Text gem's Text::Levenshtein
# Example:
# l = Levenshtein.new
# l.distance 'hello', ' hello'
# => 1
# Taken from http://www.benallfree.com/2008/10/05/finding-duplicate-text-with-ruby/
require 'inline'
class Levenshtein
inline do |builder|
builder.c "
#include <stdlib.h>
#include <malloc.h>
#include <string.h>
static int distance(char *s,char*t)
/*Compute levenshtein distance between s and t*/
{
//Step 1
int min, k,i,j,n,m,cost,*d,distance;
n=strlen(s);
m=strlen(t);
if (n==0) return m;
if (m==0) return n;
d=malloc((sizeof(int))*(m+1)*(n+1));
m++;
n++;
//Step 2
for(k=0;k<n;k++)
{
d[k]=k;
}
for(k=0;k<m;k++)
{
d[k*n]=k;
}
//Step 3 and 4
for(i=1;i<n;i++)
{
for(j=1;j<m;j++)
{
//Step 5
if(s[i-1]==t[j-1])
{
cost=0;
} else {
cost=1;
}
//Step 6
min = d[(j-1)*n+i]+1;
if (d[j*n+i-1]+1 < min) min=d[j*n+i-1]+1;
if (d[(j-1)*n+i-1]+cost < min) min=d[(j-1)*n+i-1]+cost;
d[j*n+i]=min;
}
}
distance=d[n*m-1];
free(d);
return distance;
}
"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment