Original code encoding anti-code complement

A. Machine number and the true value

Before learning the original code, anti-code and complement, the number of machines needed to understand the concept and the true value.

1, machine number

A binary number in the computer representation, the number of machines called the number. Machine number is signed, in a computer with a number of sign bits to store the most positive number of 0, a negative number.

For example, the decimal number of +3, the computer word length is 8 bits, is converted into binary 00000011. If -3 is 10000011.

Well, here's 00000011 and 10000011 is the number of machines.

2, the true value

Because the first one is the sign bit, so the number of machines in the form of value is not equal to the true value. The above example has signed number 10000011, which represents the most significant bit negative, its true value is not in the form -3 value 131 (decimal conversion equal to 131 10000011). The real value of the number corresponding to the machine so that, for the sake of distinction, with the sign bit is referred to as the true value of the machine.

Example: true value = 0000 0001 +000 0001 0001 +1,1000 = true value = -1 = -0000001

The original code, anti-code complement

A positive number of the original code, anti-code and complement all of its own. Negative anti-code is based on its original code, the sign bit unchanged, the rest of the bits inverted, negative complement is based on its original code, the sign bit unchanged, the rest of you negated, finally +1 . (i.e., on the basis of the inverted +1)

Why use the original code, anti-code and complement

Before starting depth study, my suggestion is to learn "rote" above the original code, anti-code representation and complement and calculation methods.

Now we know that the computer can have three encoding represents a positive number for the number three encoding methods because the results are the same:

[L] = [00000001] Original = [00000001] trans = [00000001] Complement

There is no need to explain too much but for negative numbers:

[-1] = [10000001] Original = [11111110] trans = [11111111] Complement

Shows that the original code, anti-code and complement are completely different. Since the original code is directly identify the human brain and used to calculate representation, why does it have anti-code and complement it?

First, because the human brain can know the first one is the sign bit, when we calculated based on the sign bit, select the subtraction of the true values ​​of the region. (The concept of true value most in the beginning of this article). But for the computer, add and subtract multiplier already is the most basic operation as simple as possible to design a computer to identify "sign bit" to make clear that the basis of computer circuit design becomes very complicated! so people came up with the sign bit is also involved in computing. we know the algorithm subtracts a positive number equal to add a negative, namely: 1-1 = 1 + (-1) = 0, so that the machine can not be in the subtraction-addition, the design of such computer operation is even simpler.

Then people began to explore involved in computing the sign bit, and to retain only the first method of addition of the original code view:

Calculates a decimal-expression: 0 = 1-1

1 - 1 = 1 + (-1) = [00000001] Hara + [10000001] Hara = [10000010] Hara = -2

If you said, so the sign bit is also involved in the calculation, apparently for subtraction, the result is not correct. This is why the internal computer does not use the original code is expressed by a number of the original code.

In order to solve the original code to do subtraction problems, there has been inverted:

Calculates a decimal-expression: 0 = 1-1

1--1 = 1 + (-1) = [0000 0001] Original + [1000 0001] Original = [0000 0001] trans + [1111 1110] trans = [1111 1111] trans = [1000 0000] Original = -0

Find calculated using the inverted subtraction, the true value of the partial results are correct. The only problem is actually appeared on the "0" this special value. Although people understand the +0 and -0 is the same, but signed 0 there is no sense. and there will be [0000 0000] original and [1000 0000] original two coded representation 0.

So there complement solve the problem of symbol 0 and two encoded:

1-1 = 1 + (-1) = [0000 0001] Original + [1000 0001] Original = [0000 0001] Complement + [1111 1111] Complement = [0000 0000] Complement = [0000 0000] Original

Such is represented by 0 [0000 0000], and no problem existed before -0 and can be used [1000 0000] -128 represents:

(-1) + (-127) = [1000 0001] Original + [1111 1111] Original = [1111 1111] Complement + [1000 0001] Complement = [1000 0000] Complement

-1-127 results should be -128, in complement with the result of the operation, the [1000 0000] Complement is -128 but be careful because actually using the previous complement -0 to represent -128, so - 128 and does not represent the original code and inverted. (-128 complement representation of [1000 0000], which make up the original code is out of [0000 0000] the original , which is not correct)

Use complement, not just to repair the 0 symbol and problems of two coding, but also represent more than a minimum number. This is why the 8-bit binary, use the original code or anti-code representation of the range of [-127, + 127], and the scope of use complement representation is [-128, 127].

Because the machine's complement, so the 32-bit programming used to type int, may indicate the range: [-2 31 is 2 31 is -1] Because the sign bit is represented by a two's complement representation is used Shiyou you can save more than a minimum.

The reason why these few introduced together, because their correlation is very strong. Compatibility relationship is compatible GB18030 GBK, GBK compatible GB2312, GB2312 compatible with ASCII. The so-called compatible, you can simply understood as a subset relationships do not conflict. Such as file GB2312 encoded in ASCII characters can occur, files encoded in GBK GB2312 can occur and ASCII characters, GB18030 encoded files can appear GBK, GB2312, ASCII characters.

The characteristics of each type of encoding:

[1] ASCII each character occupies 1bytes, if the most significant bit binary representation must be 0 (extended ASCII not taken into account), and therefore can only represent 128 ASCII characters

[2] The earliest edition of the Chinese GB2312 encoding, each character occupies 2bytes. Due to compatible with ASCII, then this may not be the most significant bit is 0 2bytes the (otherwise there will be conflict and ASCII). A collection of 6,763 Chinese and 682 special symbols in GB2312 in life have included all the most commonly used Chinese characters.

GBK GB2312 [3] Because only 6,763 Chinese, Chinese profound I, only 6763 words how enough? So GBK at no guarantee and GB2312, ASCII conflict (that is compatible with GB2312 and ASCII) premise, also occupy the way 2bytes with each word and a lot of Chinese character coding. After GBK coded Chinese character can represent up to 20,902, and another 984 Chinese punctuation, and other radicals. It is noteworthy that this also includes 20,902 Chinese characters traditional characters.

[4] However GB18030, GBK more than 20,000 words have been unable to meet our needs, and there are more possibilities you have never seen the need for Chinese character coding. This time is obviously only that one word was not enough (2bytes only up to 65,536 kinds of combinations with 2bytes, however, for compatible with ASCII, the highest bit is 0 can not have been directly eliminated half of the portfolio, leaving more than 30,000 kinds of combinations can not to meet the requirements of all the characters). So GB18030 Chinese characters using 4bytes extra coding. Of course, in order to be compatible GBK, the four bytes of the first two obviously can not (found in practical operation after two and GBK also did not conflict) with GBK conflict. Our country in 2000 and 2005, respectively, issued twice GB18030 encoding, which in 2005 was further added in the year 2000 basis. So far, GB18030-encoded files already have over 70,000 Chinese characters, and even includes minority languages.

You must be curious how the Chinese coding is done "compatible", we look at the chart:

The first two of a variety of Chinese encoding

This figure illustrates several of the previously encoded coding is complete, the range of the first 2 byte value (expressed in hexadecimal). Each byte may represent 00 to FF (i.e., 0-255). From the figure we can readily see why the GB18030 compatible with GBK, GB2312 and the ASCII. The first two do not overlap among them several coding portion. Note that only ASCII 1byte, so there is no second place. Further in the above figure GB18030 occupied area is small, but it is 4bytes encoding, which shows only the top two in FIG. If the latter two also count, GB18030 of words far more than GBK. Also note that, due to GBK compatible GB2312, GB2312 and therefore belongs to the blue area in fact be counted as GBK area. Similarly area theoretically GBK GB18030 also belong to the area. On the table only shows the extra part.

Real life, we use more than 99% of Chinese characters, in fact, that within an area in GB2312. As GB2312 code corresponding to each of the characters in the end is what this article will not go into details, you can refer to link ( link address ) queries. GBK encoding characters refer to the corresponding link ( link address ) query. As GB18030 encoding, since too many words too hard to write, hard to find online encoding the full table online. However, after some searching, or find the relevant documents (when China issued GB18030 encoding GB18030-2005 document , GB18030-2000 document ).

In actual use, GBK coding already meet most of the scenes, GB18030 encoding characters are all in this life we ​​do not necessarily see the text, which is usually the reason why the GBK often use it.

ANSI (GBK) need to be converted to Unicode, the conversion from Unicode to utf-8, and vice versa.

 

Guess you like

Origin www.cnblogs.com/leifonlyone/p/12405684.html