Let's look at a Varint algorithm in the previous article. The purpose of this algorithm is to make an integer occupy fewer bytes. For example, a number less than 127 only needs to occupy one byte, and a number less than 16384 uses 2 bytes. The algorithm is as follows:

 while (true) {
  if ((value & ~0x7F) == 0) {
    buffer[position++] = (byte) value;
    return;
  } else {
    buffer[position++] = (byte) ((value & 0x7F) | 0x80);
    value >>>= 7;
  }
}

Let's take a look at the specific legend:

We see that during the period of less than 2097153, the occupied space will be less than 4 bytes. This advantage is relatively obvious, but there are also disadvantages. For example, after more than 268435456, it will occupy 5 bytes. Considering that in most cases, it will not be applied. To such a large number, the optimization space is still good.

Through the above algorithm implementation, I found that excellent application algorithms use a lot of bit operations, but bit operations are rarely used in work. Bit operations are faster than integer operations, especially integer multiplication, which requires 10 or more clocks. If the shift operation is used, one or two clocks are enough, but since we often use decimal for arithmetic operations, I am not familiar with binary bit operations, and it will take more effort to read. Therefore, with the help of the above algorithm implementation, we analyze the advantages and applications of bit operations to better understand binary. In the above code, there are related knowledge points applied to shift operations, bit operations, byte order, etc., we will analyze them one by one.

Carry system

We know that the information stored and processed by the computer is in binary. Although the number operation is still expressed in decimal during the writing of the program, when the machine executes it, it will also be processed in binary. For those who have 10 fingers, it is natural to be familiar with the decimal system. When you teach children mathematics, they always start with counting the fingers. If we only have 2 fingers, are we right now? Will you understand binary better?

Convert other bases to base 10

Most textbooks will teach you how to interchange binary and decimal, but most of them are rote memorization, and do not explain the real meaning. After changing a decimal, it still won't work. Let's go back to the most fundamental counting methods. above, calculated from base 10. For example, if we look at a number 1001, the decimal representation is: 1x10^3+0*10^2+0*10^1+1*10^0. First of all, from right to left, we can see it as from low to high, each high digit, the index is +1, and then the decimal system is based on 10, and the third formula is calculated using decimal arithmetic (using What base to calculate the answer is equivalent to converting the current base to what base ). This method is suitable for all base conversions. After understanding this, subsequent base conversions will be easy to understand.

The binary system is relatively simple, we ignore it directly, let's see the application to the 3 system, which is also 1001, and convert the decimal formula: 1x3^3+0*3^2+0*3^1+1*3 ^0=28, we found that only the base changed, because it is 3, so the base 3 is used. In addition, the calculation method is calculated by decimal formula, which means that the answer calculated in decimal is equivalent to 3. Converted to decimal, 1001 converted to decimal is 28.

Then why not use other bases to calculate? If you use other bases to calculate, then you are familiar with the multiplication formulas of other bases. For example, the 99 multiplication formulas in decimal bases, you have to interpret the multiplication formulas of other bases by yourself, so this and our common habits Some are contradictory, and the conversion will be slower, so the conversion between decimal and other systems is generally used or as an intermediate step to process.

Convert decimal to other bases

After using the above method, we can already do all base conversions, including decimal to 3 base, for example, decimal 28 is converted to 3 base 28=2*3+22, this uses 3 base (3 in 3 bases) The system represents 10) to calculate, but it will be very troublesome. So to convert from decimal to other bases, we often use short division, as follows:

The current number is continuously divided by 3 and the remainder is taken as the new highest digit, 28 is divided by 3 and the remainder is 1, 1 is the "ones place", 9 is divided by 3 and the remainder is 0, 0 is the "tens place", 3 is divided by 3 and the remainder is 0, 0 is "hundreds" and the final 1 is "thousands". If we have noticed the previous 3-to-decimal conversion algorithm, we can find that short division is actually the inverse operation of 3-to-decimal conversion. For example, when 3-to-decimal is converted, it is: 1*3^3+ 0*3^2+0*3^1+1*3^0 , let's convert it to ((1*3+0)*3+0)*3+1, so convert the decimal to 3 time to reverse operation.

decimal

If the previous understanding, the decimal can be easily understood, let's start with 10 base. For example, in decimal 12.34, we look at the tenth place part after the decimal. 3, which means that 1 is divided into 10 parts and only 3 parts are taken, and the .04 percentile part is that 1 is divided into 100 parts and 4 parts are taken. Then we change to the formula:

12.34=1*10^1+2+3*(1/10)+4*(1/100)
     =1*10^1+2+3*10^-1+4*10^-2

We see that the decimal part is still in base, but the exponent part uses a negative number. The exponent of the bit to the left of the point is the positive power of the bit, and the right side of the point is the negative power of the exponent of the bit. After understanding this, the fractional part of other bases will also be understood, they are the same, such as binary 1001.101:

With this understanding, our follow-up floating-point numbers are easier to understand. IEEE floating-point numbers represent floating-point numbers, which are also based on this method, but only define some specifications, and we will learn more about them later.

shift operation

There are three common shift operations: left shift, logical right shift, and arithmetic right shift.

mobile operation

operate	value
parameter x	[01100011] [10010101]
x<<4	[00110000] [01010000]
x>>4 (logical shift right)	[00000110] [00001001]
x>>4 (arithmetic shift right)	[00000110] [11111001]

shift left

Moving x to the left by k bits will discard the highest k bits and add k zeros at the right end, which is often referred to as the current value multiplied by 2 to the power of k. Why is it multiplied by 2 to the power of k? When we look at the decimal system, multiplying a number by 10 means adding a 0 at the end. From this, we can think that a binary shift left by one (adding a 0 at the end) is equivalent to multiplying by 2. This conclusion is common In all base systems: add a 0 to the end of a k-ary number, which is equivalent to multiplying the number by k.

We can see from the figure that moving one bit to the left is equivalent to adding 1 to each index of the base expansion, so moving one bit is equivalent to the current number (1*2^5+1*2^1 +!*2^0)*2^1=1*2^6+1*2^2+1*2^1

move right

After understanding the principle of left shift, the principle of right shift is the same. Right shift k bits = each exponent of the base expansion is reduced by k, that is, the current number is divided by the base k to the power. The only difference is that it is divided into logical right shift and arithmetic right shift.

Logical right shift is an unsigned shift. After shifting a few bits to the right, a few 0s are added to the left end. For example, in the above Varint, each time the right shift is 7 bits, the corresponding high-order bits of the current number will be supplemented with 7 0s.

Arithmetic right shift is a signed shift. Different from logical right shift, arithmetic right shift is to add the value of k most significant bits at the left end . It looks a bit strange, but it is very useful for operations on signed integer data. We know that the signed number, the first byte, is used to represent the positive and negative of the number. Negative numbers are stored in complement form, such as [11100110], decimal is -26, after arithmetic right shift by 1 bit [11110011], decimal is -13. If it is 0, it becomes a positive number after shifting to the right.

endian

A single byte does not have a problem of endianness. When a data needs to be stored in multiple bytes, it will involve such a problem. What is the address of this data, and how to arrange these bytes in the memory, is the high-order address storage. The most significant bit, or the most significant bit is stored in the lower address.

For example, for a variable of type int, its address is the smallest address in bytes. For example, the location in the memory is 0x101, 0x102, 0x103, and its address is 0x101. If the data is a w-bit integer, the bit represents For [x(w-1),x(w-2)....,x1,x0], then x(w-1) is the most significant bit, x0 is the least significant bit, if w is a multiple of 8, Bits are grouped into bytes, then the most significant byte is [x(w-1)...x(w-8)] and the least significant byte is [x7,x6...x0]. This can also be a physical order, which is in line with the expectations of the storage order understood by ordinary people. For example, the decimal system is also high-order (hundreds, 10) in front of the position (ones).

little endian

If the logical order of the bytes is opposite to the physical order, that is, the least significant byte of w is first [x7,x6....x0], the most significant byte [x(w-1)...x(w- 8)] in the back, this time becomes little endian (little endian). Most intel compatible machines use this rule.

big endian

If the logical order of the bytes is the same as the physical order, that is, the least significant byte of w [x(w-1)...x(w-8)] comes first, the most significant byte [x7,x6... .x0] in the back, called big endian (big endian), most IBM and SunMicrosystems machines follow this rule.

For example, a hexadecimal number: 0x01234567, we use big endian and little endian to see their location in memory.

We can see that the big-endian method is more in line with our habits, with the high position in the front and the back.

The above Varint algorithm uses the little endian method to store the byte order.

 buffer[position++] = (byte) ((value & 0x7F) | 0x80);

Each time, the last 7 bytes of the current data are obtained and stored in the data stream buffer, that is, the low-order byte is placed in front of the buffer byte array.

----------------------------------------------end------------------------------------------------

Scan to pay more attention, pay attention to personal growth and technical learning, and look forward to using your own little changes to bring you some inspiration and insights.

funny binary