Clear Unicode character encoding 8- [4] (utf-8)

This blog describes actually stored in Unicode encodingutf-8

What is utf-8

  • utf-8Storing encoded is a practical
  • That is, it can be stored coding theory ucs2and coding theory can be storeducs4
  • Storing variable length, the length of each character encoding may not be the same.
  • It is to solve ucs-2waste problems arising.

How utf-8 storage ucs2

Program

range Program
0 < x 7 0 < x \leq 7 0xxxxxxx
7 < x 11 7 &lt; x \leq 11 110xxxxx 10xxxxxx
11 < x 16 11 &lt; x \leq 16 1110xxxx 10xxxxxx 10xxxxxx
16 < x 21 16 &lt; x \leq 21 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
21 < x 26 21 &lt; x \leq 26 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
26 < x 31 26 &lt; x \leq 31 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Encoding step

  • Step One: Find the original ucs-2coding
    • For example, the word is encoded ucs-201111110 11001111
  • Step two: in bytes high last year0
    • After the removal of the high 0, encoded into a word 1111110 11001111, leaving 15
  • The third step: the rest of the digits seen x, look for the corresponding program based on the remaining digits
    • Here we should find solutions1110xxxx 10xxxxxx 10xxxxxx
  • Step 4: According to "the order from right to left" to fill the remaining coding scheme of xthe
    • Here, to 1111110 11001111fill to 1110xxxx 10xxxxxx 10xxxxxxthe
    • The results should be filled: 1110X 111 10 111011 10 001111
  • Step 5: if x is filled with zeros to fill the rest of the well, to give utf-8 encoded
    • 1110x111 10111011 10001111X obtained after the filling with 011100111 10111011 10001111
    • So characters of utf-8 encoding is11100111 10111011 10001111

How utf-8 storage ucs4

  • Like the storage ucs2, but also longer storage.
  • But when encoding ucs2only use the first three kinds of programs, ucs4six kinds of programs used in plenary.

Guess you like

Origin blog.csdn.net/Sacredness/article/details/93379989