org.archiviststoolkit.util
Class FuzzyStringMatcher

java.lang.Object
  extended by org.archiviststoolkit.util.FuzzyStringMatcher

public class FuzzyStringMatcher
extends java.lang.Object

Utility class, providing methods for a fuzzy comparison between strings.

Since:
MMBase-1.5

Method Summary
static float getMatchRate(java.lang.String string1, java.lang.String string2)
          Calculates the match rate, a value between 0 and 1, proportional to the rate the two strings match (1 is exact match).
static int getMismatch(java.lang.String string1, java.lang.String string2)
          Calculates the mismatch between two strings.
static java.lang.String normalizeString(java.lang.String str)
          Creates normalized title.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getMismatch

public static int getMismatch(java.lang.String string1,
                              java.lang.String string2)
Calculates the mismatch between two strings.

Parameters:
string1 - first string
string2 - second string
Returns:
The number of mismatches,this is the minimum number of typo's necessary to account for the differences between the two strings, if they were meant to be identical.

getMatchRate

public static float getMatchRate(java.lang.String string1,
                                 java.lang.String string2)
Calculates the match rate, a value between 0 and 1, proportional to the rate the two strings match (1 is exact match). This is calculated as 1 - (mismatch/max(string1.length(), string2.length())).

Parameters:
string1 - first string
string2 - second string
Returns:
The match rate.

normalizeString

public static java.lang.String normalizeString(java.lang.String str)
Creates normalized title. e.g. all non-alphanumeric characters replaced by white space, all characters converted to lowercase non-diacritical characters, and all white space sequences contracted to a single white space character. This is a convenience method, provided to make string comparison easier by removing (more or less) arbitrary differences.

Parameters:
str - The original title.
Returns:
The normalized title.