Matching Same Hebrew Words Always Return False

Suraj Bahadur :

I was trying to matched same Hebrew words with each other but it always calls else part of program.

Here what i actually tried in code:

Calling a method by passing a Hebrew word(Coming from string.xml)

<string name="shevat" >שְׁבָט‬</string>

getCurrentMonthIndex("שְׁבָט")

Below method always return false

private boolean getCurrentMonthIndex(String month) {
    if (month.equals("שְׁבָט")) {
        Log.d("Result:", "equal");
        return true;
    } else {
        Log.d("Result:", "not equal");
        return false;
    }
}

If i hardcoded the value then it return true

private boolean getCurrentMonthIndex(String month) {
    if ("שְׁבָט".equals("שְׁבָט")) {
        Log.d("Result:", "equal");
        return true;
    } else {
        Log.d("Result:", "not equal");
        return false;
    }
}
fthdgn :

Your string resource has one more Unicode character which is not visible.

This is your string on resource: https://www.fontspace.com/unicode/analyzer/?q=%D7%A9%D6%B0%D7%81%D7%91%D6%B8%D7%98%E2%80%AC

This is your string on code: https://www.fontspace.com/unicode/analyzer/?q=%D7%A9%D6%B0%D7%81%D7%91%D6%B8%D7%98

The extra character is U+202C POP DIRECTIONAL FORMATTING.

I encountered similar problem while comparing Arabic strings. In my case, the invisible character was U+200E LEFT-TO-RIGHT MARK.

Before comparing strings, I trimmed this character from them. You can also trim POP DIRECTIONAL FORMATTING. Also you can try to remove this character from resource file by using a hex editor.

In case of the links are not working, unicode analyze of your string:

U+05E9  HEBREW LETTER SHIN
U+05B0  HEBREW POINT SHEVA
U+05C1  HEBREW POINT SHIN DOT
U+05D1  HEBREW LETTER BET
U+05B8  HEBREW POINT QAMATS
U+05D8  HEBREW LETTER TET
U+202C  POP DIRECTIONAL FORMATTING //only on resource file

I don't know much about Hebrew but I think you can encounter another problem also in the future. In your word first letter has two modifiers: U+05B0 HEBREW POINT SHEVA and 0+05C1 HEBREW POINT SHIN DOT. Even though two letters below look exactly same, they are not equal. The modifiers are written on different order.

שְׁ : U+05E9 + U+05B0 + U+05C1

שְׁ : U+05E9 + U+05C1 + U+05B0

I encountered similiar problem on Arabic. Even thought two words below look identical, they are not equal to each other. U+064E ARABIC FATHA and U+0651 ARABIC SHADDA are written on different orders.

رَّ : U+0631 + U+064E + U+0651

رَّ : U+0631 + U+0651 + U+064E

For Arabic, on my typescript project, I wrote an utility method to normalize strings before comparing them. Normalization method removes all LEFT-TO-RIGHT MARK characters and reorders modifier characters in a standard way. I think you may need to do similar thing for Hebrew.

@Elias N has pointed out that Java already has a method to normalize strings. This method does not remove POP DIRECTIONAL FORMATTING or LEFT-TO-RIGHT MARK.

String a = "שְׁ";  //U+05E9 + U+05B0 + U+05C1
String b = "שְׁ";  //U+05E9 + U+05C1 + U+05B0

String nomrA = java.text.Normalizer.normalize(a, java.text.Normalizer.Form.NFC);
String nomrB = java.text.Normalizer.normalize(b, java.text.Normalizer.Form.NFC);

assertFalse("Original strings are not equal.", a.equals(b));
assertTrue("Normalized strings are equal.", normA.equals(normB));

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=146875&siteId=1