Research on computer code

When it comes to coding, computer science certainly knows ASCII, which is a set of encoding rules that use 7bit representing 128 characters (symbols) can be considered that those symbols were on our keyboard, his official name is called: United States Information ascii, United States!

He can also be considered a set of code values ​​in the table, i.e. such x110 0001 represent the character 'a',

However, no problem in English, Japanese, Korean, Chinese and our profound supposed to, so he was born more coding rules

A, Unicode

This is a code value table, he collected all the symbolic systems in the world, all, including syllables, mystical symbols, just give me a character, I'm a look-up table, he knew the codes corresponding point is how much?

But that can not be so directly to check to his total set of rules which character code points which correspond to specification, so there is a corresponding encoding: UTF-16 / UTF-32 .....

What specifically how to do it, or that it is a coding rules?

1, the concept Introduction

Character / symbol / code points: 'a', '1' ...... is a character / symbol / code point; Videos at random is a symbol / code point; 1010 to encode a symbol by the computer, so that group 01 is a digital code point

Unicode / UTF-32 / UCS-4 ...: Unicode is a code value table, UTF-32 encoding rules and the like are expressed in what code rule generating - correspondence value

 

2, encoding rules

Each symbol is assigned a code point, i.e., a type of variable rune

 

1)UTF-8

This rule is to say, with a variable-length sequence of 0101 to as a code point to represent a character that is represented by 1-4Byte a Unicode code point, specifically as follows

(1) belonging to the character original ASCII character set, or one byte

(2) some commonly used characters are represented in 3 bytes or 2,

(3) Other

? That in the end this character is made up of several bytes of it A: The first byte of the reserved high bit as prefix, such as:

0xxxxxxx: 0-127, if the first character is 0, indicating that the next one byte code point one yards, this is the ASCII character code point representation

110xxxxx 10xxxxxx: 128-2047, if the first character 110 is described next 2 bytes of code points representative of one

1110xxxx 10xxxxxx 10xxxxxx: 2048-65535, if the first character is 1110, indicating that the next three bytes representing one yard point

11110xxxx 10xxxxxx 10xxxxxx 10xxxxxx: 65536-, if the first character is 11110, indicating that the next four bytes representing one yard point

 

3. The benefits of this way of coding

Compact, does not take up extra storage space, and is compatible with ASCII character set;

Belongs to the prefix code, if it is decoded from left to right, then certainly there will be no ambiguity

If the decoding is from right to left, back multi-byte two can determine the start position of the current character code (because up to 4 bytes, even if the back is in a fourth byte to 2 bytes not forward prefix then know, I was four bytes, then forward that the head)

 

4, application scenarios

GO language source file is encoded in UTF-8, and the GO language library functions to handle UTF-8 text encoding is also very good.

unicode package provides many functions associated character processing function rune, wherein unicode / utf8 UTF8 package provides a function for encoding and decoding processing sequence of characters rune

- "GO language Bible," said the

 

Water Devils:? I do not know if you have such doubts, "Go language source files using UTF8 encoding" What do you mean we should say a little here about compiler theory

                Go language or source code written in other languages, the machine does not know, he must become 010101 to perform, so we have a common pre-compile, compile, assemble, link concepts, see xxxxxx

                Therefore, from the source code if else balala such statement is converted into a binary 010101, is transformed according to the rule utf-8

 

2)UTF-16

.....

3)UTF-32

....

 

------------------ unfinished, continued ---------------------------- ----- 

Guess you like

Origin www.cnblogs.com/shuiguizi/p/11372985.html