How to compare strings by similarity without ignoring typos?

Thiago Cruz :

I need to compare two strings by proximity, in case that the string.equals on the full string fails, I need to compare always the first name, and the middle and/or last name.

I already have find some comparison algorithm, but they all consider the misspelling on the result and I have to compare the exact input.

Examples:

  1. Maria souza silva = Maria souza silva = ok
  2. Maria souza silva = Maria silva = ok
  3. Maria souza silva = Maria Carvalho = Nok
  4. Maria souza silva = Ana souza silva = Nok
  5. Maria de souza silva = Maria de = Nok
  6. Maria de souza silva = Maria souza = OK

I`m trying something like this:

String name = "Maria da souza Silva";

String nameRequest = "Maria da Silva";

if(name.equalsIgnoreCase(nameRequest)){
    System.out.print("ok 0");
}

String[] names = name.split(" ");

int nameLenght = names.length-1;

if(nameRequest.startsWith(names[0])){
    System.out.println("ok 1, next");
} else {
    System.out.print("nok, stop");
}

if(nameRequest.endsWith(names[nameLenght])){
    System.out.print("ok 2");
}

The result is ok 1, next and ok 2.

The first and last name is OK, but I need to compare the middle name and ignore the ones like "de/da".

ZGorlock :

I was going to use pure regex at first, and there is probably a way, but this code will produce the results you are looking for, using first and last, or first and middle, and ignoring de and da.

private void checkName(String target, String source) {
    Pattern pattern = Pattern.compile("^(?<firstName>[^\\s]+)\\s((de|da)(\\s|$))?(?<otherName>.*)$");
    Matcher targetMatcher = pattern.matcher(target.trim().toLowerCase());
    Matcher sourceMatcher = pattern.matcher(source.trim().toLowerCase());
    if (!targetMatcher.matches() || !sourceMatcher.matches()) {
        System.out.println("Nok");
    }

    boolean ok = true;
    if (!sourceMatcher.group("firstName").equals(targetMatcher.group("firstName"))) {
        ok = false;
    } else {
        String[] otherSourceName = sourceMatcher.group("otherName").split("\\s");
        String[] otherTargetName = targetMatcher.group("otherName").split("\\s");

        int targetIndex = 0;
        for (String s : otherSourceName) {
            boolean hit = false;
            for (; targetIndex < otherTargetName.length; targetIndex++) {
                if (s.equals(otherTargetName[targetIndex])) {
                    hit = true;
                    break;
                }
            }
            if (!hit) {
                ok = false;
                break;
            }
        }
    }
    System.out.println(ok ? "ok" : "Nok");
}

For your examples, the output is:

ok
ok
Nok
Nok
Nok
ok

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=169135&siteId=1