[Computer Basics] Back and forth in the world of 0 and 1

The pros and cons of things have been discussed by philosophers for thousands of years. The 0 and 1 in the computer also played various tricks as usual.

Binary number VS decimal number

This section talks about binary notation and the conversion method to decimal. If you are already familiar with these contents, you can skip to the next section.

We live in a decimal world. Ten a dime is one piece, and ten one-two is a catty. In mathematics, there is a full decimal or a borrowed one.

The base of decimal numbers is from 0 to 9, so all decimal numbers are composed of nine numbers from 0 to 9.

The bottom layer of the computer deals with binary numbers. You can compare decimal numbers to see the characteristics of binary numbers:

Full two enters one or borrows one as two, the base is 0 and 1, that is to say, all binary numbers are a combination of 0 and 1.

As far as the decimal system is concerned, ten 1s have reached the condition of "full ten", so they have to "advance one", so it is 10. This is a decimal number, and its value is ten, because ten ones are combined together.

In terms of binary, two 1s have reached the condition of "full two", so they have to "advance one", so it is 10, which is a binary number, and its value is two, because two 1s are combined together.

If you just understand this, combined with the characteristics of decimal and binary, then it will be very easy to understand:

1 + 1 = 2 -> 10。

1 + 1 + 1 = 3 = 2 + 1 -> 10 + 1 -> 11。

1 + 1 + 1 + 1 = 4 = 3 + 1 -> 11 + 1 -> 100。

By analogy, list a few decimal and corresponding binary:

0 -> 000

1 -> 001

2 -> 010

3 -> 011

4 -> 100

5 -> 101

Next, try to find the conversion relationship between binary and decimal.

First of all, how is the decimal number represented by the number in each position? I believe everyone is familiar. Such as the following example:

123 -> 100 + 20 + 3

123 -> 1 100 + 2 10 + 3 * 1

Because the decimal system is full of one decimal, if you want to find a way to connect with ten, 100 is the 2nd power of 10, 10 is the 1st power of 10, and 1 is the 0th power of 10, so:

123 -> 1 10 ^ 2 + 2 10 ^ 1 + 3 * 10 ^ 0;

Furthermore, we find that the position of the hundreds place is 3, but the power is 2, which is exactly 3 minus 1, and the position of the tens is 2, but the power is 1, which is exactly 2 minus 1, and the ones place is 1 minus. Go to 1, which is the 0th power.

So, this formula came out, it's too simple, everyone knows it, so I won't write it.

Then, let's try this "routine" into a binary number, but the binary number is full two into one, so use the power of 2.

000 -> 0 2 ^ 2 + 0 2 ^ 1 + 0 2 ^ 0
000 -> 0
4 + 0 2 + 0 1 -> 0
000 -> 0

001 -> 0 2 ^ 2 + 0 2 ^ 1 + 1 2 ^ 0
001 -> 0
4 + 0 2 + 1 1 -> 1
001 -> 1

010 -> 0 2 ^ 2 + 1 2 ^ 1 + 0 2 ^ 0
010 -> 0
4 + 1 2 + 0 1 -> 2
010 -> 2

011 -> 0 2 ^ 2 + 1 2 ^ 1 + 1 2 ^ 0
011 -> 0
4 + 1 2 + 1 1 -> 3
011 -> 3

100 -> 1 2 ^ 2 + 0 2 ^ 1 + 0 2 ^ 0
100 -> 1
4 + 0 2 + 0 1 -> 4
100 -> 4

101 -> 1 2 ^ 2 + 0 2 ^ 1 + 1 2 ^ 0
101 -> 1
4 + 0 2 + 1 1 -> 5
101 -> 5

We found that the calculated numbers are exactly the corresponding decimal numbers. Is this a coincidence? Of course not. in fact:

This is how to convert binary numbers to decimal numbers.

We can also imitate mathematics and derive a formula:

d = b(n) + b(n - 1) + ... + b(1) + b(0)

b(n) = a * 2 ^ n,(a = {0、1},n >= 0)

It is to convert each bit of the binary number into a decimal number and add them up.

Negative binary VS positive binary

The previous section used positive numbers as examples. In addition to positive numbers, there are negative numbers and zeros.

Therefore, the computer industry stipulates that when positive and negative needs to be considered, the highest bit of the binary is the sign bit.

That is, the 0 or 1 in this position is used to represent the numerical symbol, not to calculate the numerical value, and stipulates:

0 is a positive number, and 1 is a negative number.

That 0 is neither positive nor negative, how should it be represented? Output the binary of 0:

0 -> 00000000

It is found that all are 0, and the highest bit is also 0, so 0 is a special case.

Next, I will explain the binary representation of negative numbers to ensure that after reading it, there is a feeling of "sudden realization" (if not, then I can't help it), haha.

Influenced by mathematics for a long time, to turn a positive number into a corresponding negative number, just add a negative sign "-" in front.

Based on this, combined with the above regulations in the computer world, we can easily assume that a positive number will become a corresponding negative number as long as its highest bit is set from 0 to 1, like this:

Because the binary of 1 is 00000001

So the binary of -1 is 10000001

Solemnly declare that this is wrong. Read on to know the reason.

First, the correct result will be given from the official point of view (for pretending to be b), and then the correct result will be given from the personal point of view (for a sudden understanding).

From an official (or academic) perspective, first introduce three concepts:

Original code: Treat a number as a positive number (remove the negative sign if it is negative), and its binary representation is called the original code.

Inverse code: Turn 0 in the original code into 1, and 1 into 0 (that is, swap 0 and 1), and the result is called the inverse code.

Complement code: add 1 to the inverse code, and the result is called the complement code.

(This is a term in academia, don’t worry about why, just remember)

Take -1 as an example to make the following derivation:

Treat -1 as 1, the original code is 00000001

Reverse 0 and 1, the inverse code is 11111110

Then add 1, the complement is 11111111

So the complement of -1 is 11111111. Then use the tool class in the class library to output the binary form of -1 and find that it is still it. This is not a coincidence, because:

In a computer, the binary of a negative number is represented by its complement.

This is the official way of saying, I always like to confuse everyone with some nouns.

From a personal perspective, let’s reveal the secret in the most "earthquake" way.

First of all, the binary format of -1 is 11111111, which is really not easy to accept at once.

On the contrary, it is more acceptable to assume that the binary of -1 is 10000001, because the binary of 1 corresponding to it is 00000001.

In this way, from the point of view of the magnitude of the value (that is, the absolute value), it is all 1, and from the symbolic point of view, one is 1 and the other is 0, which just means one negative and one positive, which is simply "perfect".

So why is the form of this assumption wrong?

Because from a decimal point of view, 1 + (-1) = 0.

Then convert them to the corresponding binary in the form of hypothesis,

00000001 + 10000001 = 10000010,

According to the assumption, the value of this result is -2.

It can be seen that one is 0 and the other is -2, which is obviously wrong. Although different bases are used, the result should be the same.

Obviously, the result of the binary calculation method is wrong. The reason for the error is that the binary form of -1 cannot be performed in the way we assume.

What logic should be used to calculate the binary value of -1? I believe you have guessed it.

Because, -1 = 0-1, so,

-1 = 00000000 - 00000001 = 11111111。

Therefore, the binary of -1 is 11111111. Thus,

-1 + 1 = 11111111 + 00000001 = 00000000 = 0。

Does this immediately understand why the -1 binary is all 1s? Because this form meets the needs of numerical calculations.

In the same way, the binary of -2 can be calculated,

-2 = -1 - 1 = 11111111 - 00000001 = 11111110。

In fact, the conversion relationship between the original code/inverse code/complement code is also designed based on the sum of positive and negative numbers being zero. After careful experience, you can understand.

It can be seen that the essence of the official perspective and the personal perspective are the same, except that there is a sunny and white snow and a swan Liba.

This reminds me of elegance and vulgarity. Many people advertise the pursuit of elegance, but what they need is vulgarity.

Here are some examples of positive numbers and corresponding negative numbers:

2,00000010
-2,11111110

5,00000101
-5,11111011

127,01111111
-127,10000001

You can see that the sum of decimal numbers is 0, and the sum of corresponding binary numbers is also 0.

This is the correct binary representation of a negative number, although it looks different than it feels.

As far as the decimal system is concerned, when the number of digits is fixed and all positions are 9, the value reaches the maximum. For example, the maximum four digits are 9999.

The same is true for binary. Except for the highest bit 0 which represents a positive number, when the remaining positions are all 1, the value reaches the maximum. For example, the maximum eight digits is 01111111, and the corresponding decimal number is 127.

The length of a byte is 8 bits, so the largest positive number that a byte can represent is 127, that is, a 0 carries 7 1s, which is a positive boundary value.

By observing negative numbers, except for the highest bit 1 which means negative numbers, when the following 7 digits are all 0, it should be the minimum value of negative numbers, that is, a 1 with 7 0s, and the corresponding decimal number is -128, which is the negative boundary Worth it.

And the positive and negative boundary values ​​are related, have you found out? That is, the opposite number after adding 1 to the positive boundary value is the negative boundary value.

Binary regular operations

These contents should be very familiar, just glance at it.

Bit manipulation

And:

1 & 1 -> 1
0 & 1 -> 0
1 & 0 -> 0
0 & 0 -> 0

或(or):

0 | 0 -> 0
0 | 1 -> 1
1 | 0 -> 1
1 | 1 -> 1

Not (not):

~0 -> 1
~1 -> 0

Exclusive OR (xor):

0 ^ 1 -> 1
1 ^ 0 -> 1
0 ^ 0 -> 0
1 ^ 1 -> 0

Shift operation

Move left (<<):

The left is discarded (the sign bit is still discarded), and the right is filled with 0.

After shifting, the highest bit is 0 for positive number, and 1 for negative number.

Shifting one bit to the left is equivalent to multiplying by 2, two bits is equivalent to multiplying by 4, and so on.

When moving to the left by one cycle, it returns to the origin. That is equivalent to not moving.

After more than one period, remove the period part and move the rest.

When the number of bits moved is equal to the length of the binary itself, it is called a period. Such as 8-bit length binary shift 8 bits.

Move right (>>):

The right is discarded, positive numbers are filled with 0 on the left, and negative numbers are filled with 1 on the left.

Shifting one bit to the right is equivalent to dividing by 2, two digits is equivalent to dividing by 4, and so on.

When rounding, choose rounding for positive numbers and rounding for negative numbers.

Shifting the positive numbers to the right starts from discarding all and the subsequent values ​​are all 0, because the values ​​added from the left are all 0, until a cycle is reached, it returns to the origin, that is, returns to the original value. Equivalent to not moving.

Shifting the negative numbers to the right will be -1 from the time they are discarded, because the ones added from the left are all 1, until it reaches a cycle, it returns to the origin, that is, returns to the original value. Equivalent to not moving.

After more than one period, remove the period part and move the rest.

Unsigned right shift (>>>):

The right side is discarded, and the left side of the positive or negative number is filled with 0.

Therefore, for positive numbers, there is no difference between right shift (>>).

For a negative number, it becomes a positive number, which is to use the original complement form, discard the right side and treat it as a positive number.

Why is there no unsigned shift left?

Because when shifting to the left, 0 is added to the right, and the sign bit is on the leftmost side, and the things added to the right cannot affect it.

Some people may think that after reaching a cycle, it will affect it if it moves again, haha, it will be reset to zero during a cycle.

Binary expansion/contraction

The following content assumes the order of high-order byte before low-order byte.

stretch:

If one byte is stretched to two bytes, the high byte needs to be filled. (Equivalent to assigning byte type to short type)

In fact, this byte remains unchanged, and another byte is added to its left.

At this time, both the sign and the value remain unchanged.

The sign bit of a positive number is 0, and the high byte is filled with 0 when it is stretched.

00000110 -> 00000000,00000110

The sign bit of the negative number is 1, and the high byte is filled with 1 when it is stretched.

11111010 -> 11111111,11111010

Shrink:

To compress two bytes into one byte, the high byte needs to be truncated. (Equivalent to forcing the short type to the byte type)

In fact, the left byte is directly discarded, and the right byte remains unchanged.

At this time, both the sign and the value may change.

If the compressed bytes can still fit this number, the sign and value size remain unchanged.

Specifically, if the upper byte of the positive number is all 0, the highest bit of the lower byte is also 0. The high-order byte of an or negative number is all 1, and the high-order byte of the low-order byte is also 1. Truncating the high-order byte will not affect the number.

00000000,00001100 -> 00001100

11111111,11110011 -> 11110011

If the compressed bytes cannot fit this number, the value must be changed.

Specifically, if the high byte of a positive number is not all 0s, and the high byte of a negative number is not all 1, truncating the high byte will definitely affect the size of the number.

Whether the sign changes depends on whether the original sign bit is the same as the compressed sign bit.

For example, the size changes after compression, and the symbol remains unchanged as follows:

00001000, 00000011 compressed to 00000011, still positive 11011111,
11111101 compressed to 11111101, still negative

For example, the size and symbol are changed as follows after compression:

00001000,10000011 are compressed to 10000011, and positive numbers become negative numbers.
11011111, 01111101 are compressed to 01111101, and negative numbers become positive numbers.

Serialization and deserialization of integers

Generally speaking, an int type is composed of four bytes. When serializing, these four bytes need to be disassembled one by one and put into a byte array in order.

When deserializing, take these four bytes from the byte array, connect them together in order, and reinterpret them as an int type number. The result should remain unchanged.

In serialization, shift and compression are mainly used.

First move the byte to be split to the lowest bit (that is, the rightmost), and then force conversion to byte type.

If there is an int type number as follows:

11111001,11001100,10100000,10111001

The first step is to shift 24 bits to the right and keep only the lowest eight bits.

byte b3 = (byte)(i >> 24);

11111111,11111111,11111111,11111001

11111001

The second step is to shift 16 bits to the right and retain only the lowest eight bits.

byte b2 = (byte)(i >> 16);

11111111,11111111,11111001,11001100

11001100

The third step is to shift 8 bits to the right and keep only the lowest eight bits.

byte b1 = (byte)(i >> 8);

11111111,11111001,11001100,10100000

10100000

The third step is to shift 0 bits to the right and retain only the lowest eight bits.

byte b0 = (byte)(i >> 0);

11111001,11001100,10100000,10111001

10111001

In this way, four bytes are generated, just put them into the byte array.

byte[] bytes = new byte[]{b3, b2, b1, b0};

When deserializing, the main use is elongation and displacement.

First take a byte from the byte array, convert it to int type, and then deal with the symbol problem, and then move left to a suitable position.

first step:

Take out the first byte,

11111001

Then stretch to int,

11111111,11111111,11111111,11111001

Because its sign bit represents the sign bit of the original integer, there is no need to deal with the sign, and it is directly shifted to the left by 24 bits.

11111001,00000000,00000000,00000000

The second step:

Take out the second byte,

11001100

Then stretch to int,

11111111,11111111,11111111,11001100

Because its sign bit is in the middle of the original integer, it does not represent the sign but the value. The sign bit needs to be processed, which is to perform an AND operation.

As follows, the above two lines and get the third line,

11111111,11111111,11111111,11001100
00000000,00000000,00000000,11111111

00000000,00000000,00000000,11001100

Then shift left 16 bits

00000000,11001100,00000000,00000000

third step,

Take out the third byte,

10100000

Then stretch to int,

11111111,11111111,11111111,10100000

Then process the sign bit,

00000000,00000000,00000000,10100000

Then shift left by 8 bits,

00000000,00000000,10100000,00000000

the fourth step,

Take out the fourth byte,

10111001

Then stretch to int,

11111111,11111111,11111111,10111001

Then process the sign bit,

00000000,00000000,00000000,10111001

Then shift to the left by 0 bits,

00000000,00000000,00000000,10111001

This four steps produced four results, as follows:

11111001,00000000,00000000,00000000
00000000,11001100,00000000,00000000
00000000,00000000,10100000,00000000
00000000,00000000,00000000,10111001

You can see that the four bytes are already in the position where they should be.

Finally, an addition operation is enough, in fact, the OR operation is also possible.

i = i4 + i3 + i2 + i0

i = i4 | i3 | i2 | i0

In this way, we synthesize the four bytes in the byte array into an int type number.

Simulate unsigned numbers

Unsigned number means that the highest bit is not a sign bit but a value bit.

Some languages ​​such as Java do not support unsigned numbers, so you need to use signed numbers to simulate the implementation.

Because the range of the same type as an unsigned number will be greater than that of a signed number, a longer type is used to store the short type of unsigned number.

If the byte type is a byte, the range is -128 to 127 when used as a signed number, and the range is 0 to 255 when used as an unsigned number, so at least two bytes of short type are required to store.

The processing method is very simple, just two steps, extending and processing the sign bit.

If a byte is 10101011, this is a negative number of byte type.

The first step, stretch, now becomes two bytes, but it is still a negative number

11111111,10101011

The second step is to process the symbol, that is, perform an AND operation

11111111,10101011
00000000,11111111

00000000,10101011

This has been processed, from a negative number of one byte to a positive number of two bytes.

In fact, it is to connect a byte with all 0s in front of the original byte (that is, to the left).

When byte is used as an unsigned number and reaches the maximum value of 255, the binary is like this

00000000,11111111

At this time, the low position has just been used.

Therefore, using the long type to represent the unsigned number of the short type, the highest utilization efficiency of the byte of the long type is 50%.

In this case, you only need to write the lower half of the bytes during serialization.

When deserializing, one is to use the long type to inherit, and the other is that all bytes must be processed with signs and treated as unsigned numbers.

PS: This is a serious review of the basic knowledge of professional courses in the university ten years ago.

In fact, when I was writing the "Product Spring" series of articles, I found that it is best to be familiar with the internal structure of Java bytecode (.class) files.

When trying to parse the bytecode file, I found that it stores unsigned numbers, so I need to write a tool that deserializes byte arrays to unsigned numbers.

When writing tools, I read a bit of the source code of the relevant parts of the JDK, and I simply wrote the code and tested the basic knowledge and operation of the binary.

So I compiled this article, ha ha.

All code examples in the article:
https://github.com/coding-new-talking/java-code-demo.git

(END)

Guess you like

Origin blog.51cto.com/15049788/2562061