This blog describes actually stored in Unicode encoding
utf-8
What is utf-8
utf-8
Storing encoded is a practical- That is, it can be stored coding theory
ucs2
and coding theory can be storeducs4
- Storing variable length, the length of each character encoding may not be the same.
- It is to solve
ucs-2
waste problems arising.
How utf-8 storage ucs2
Program
range | Program |
---|---|
0xxxxxxx |
|
110xxxxx 10xxxxxx |
|
1110xxxx 10xxxxxx 10xxxxxx |
|
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
|
111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
|
1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
Encoding step
- Step One: Find the original
ucs-2
coding- For example,
经
the word is encoded ucs-201111110 11001111
- For example,
- Step two: in bytes high last year
0
- After the removal of the high 0, encoded into a word
1111110 11001111
, leaving 15
- After the removal of the high 0, encoded into a word
- The third step: the rest of the digits seen
x
, look for the corresponding program based on the remaining digits- Here we should find solutions
1110xxxx 10xxxxxx 10xxxxxx
- Here we should find solutions
- Step 4: According to "the order from right to left" to fill the remaining coding scheme of
x
the- Here, to
1111110 11001111
fill to1110xxxx 10xxxxxx 10xxxxxx
the - The results should be filled: 1110X 111 10 111011 10 001111
- Here, to
- Step 5: if x is filled with zeros to fill the rest of the well, to give utf-8 encoded
1110x111 10111011 10001111
X obtained after the filling with 011100111 10111011 10001111
- So characters
经
of utf-8 encoding is11100111 10111011 10001111