String Similarity (Levenshtein)
From Erlang Community
Revision as of 16:15, 27 October 2006 by 213.171.204.166 (Talk)
Problem
You need to compare two strings and get an index of how similar they are.
Solution
This module implements the Levenshtein edit distance algorithm (described more here). In short, it calculates the number of edit steps that are needed to transform the source string to the target string. The lesser the more similiar.
%%%============================================================================= %%% @author Adam Lindberg |
You can use the levenshtein function to compare two strings.
2> string_metrics:levenshtein("Aloha!", "Alhoa!").
2
3> string_metrics:levenshtein("adam", "Adam").
1
4> string_metrics:levenshtein("adam", "Assam").
3
5> string_metrics:levenshtein("teh", "the").
2
6> string_metrics:levenshtein("the", "the").
0
7> string_metrics:levenshtein("the", "").
3
|
Note that the function is not case insensitive (and the algorithm isn't either), though you can always use the httpd_util library to_lower or to_upper functions to put the two strings on an equal footing:
8> string_metrics:levenshtein(httpd_util:to_lower("Adam"), "adam").
0
|

Digg It
Del.icio.us
Reddit
Facebook
Stumble Upon
Technorati

