Unicode principle and system conversion Chinese

Code points
Unicode standard intention is simple: want to give every character of each writing system in the world, is assigned a unique integer, the integer is called code points (Code Points).

Code space
for all the code points constituting a code space (Code Space), defined according to Unicode, a total of 1,114,112 code points, numbered from 0x0 to 0x10FFFF. In other words, if each code point can represent a valid character, then, Unicode standard can encode up to 1,114,112, which is about 1.1 million characters. The latest Unicode standard (7.0) has given more than 110,000 characters assigned code points.

Code plane
Unicode code points into the standard 17 plane codes (Code Plane), numbered from # 0 to # 16.
Each code plane containing 65,536 (2 ^ 16) code point (17 * 65,536 = 1,114,112).
Which, Plane # 0 is called Basic Multilingual Plane (Basic Multilingual Plane, BMP), the remaining planes are called supplementary plane (Supplementary Planes).
Unicode7.0 only 17 planes 6 and 6 for this plane from the name, the following

 

public String getChineseByunicode(String sunicode) {

        char a;
        int len = sunicode.length();
        StringBuffer outBuffer = new StringBuffer(len);
        for (int b = 0; b < len; ) {
            a = sunicode.charAt(b++);
            if (a == '\\') {
                a = sunicode.charAt(b++);
                if (a == 'u') {
                    int value = 0;
                    for (int i = 0; i < 4; i++) {
                        a = sunicode.charAt(b++);
                        switch (a) {
                            case '0':
                            case '1':
                            case '2':
                            case '3':
                            case '4':
                            case '5':
                            case '6':
                            case '7':
                            case '8':
                            case '9':
                                value = (value << 4) + a - '0';
                                break;
                            case 'a':
                            case 'b':
                            case 'c':
                            case 'd':
                            case 'e':
                            case 'f':
                                value = (value << 4) + 10 + a - 'a';
                                 break;
                            case 'A':
                            case 'B':
                            case 'C':
                            case 'D':
                            case 'E':
                            case 'F':
                                value = (value << 4) + 10 + a - 'A';
                                        break;
                            default:
                                throw new IllegalArgumentException(
                                        "Malformed   \\uxxxx  encoding.");

                        }
                    }
                    outBuffer.append((char) value);
                } else {
                    if (a == 't') a = '\t';
                    else if (a == 'r') a = '\r';
                    else if (a == 'n') a = '\n';
                    else if (a == 'f') a = '\f';
                    outBuffer.append(a);
                }
            } else outBuffer.append(a);
        }
        return outBuffer.toString();
    }

  

 

Guess you like

Origin www.cnblogs.com/qianjinyan/p/11445771.html