Reverse Bits

A test question from a company, the problem is very simple:

Enter a byte (8 bits) and reverse the bit order.
In other words, if the eight bits of the input byte are "abcdefgh", you will get "hgfedcba". As an interview question or written question, naturally, it implies a requirement: efficiency is as high as possible.
There is an extended version of this problem, perhaps more on the Internet: instead of bytes, you enter a 32-bit integer (DWORD), and reverse the bitwise order.

To be honest, sometimes this kind of problem is quite a bit of a horn, is it necessary to check the ability of the programmer? The reason why it is labeled "Qiqi Yinqiao" is also to express a meaning: this kind of technology is not widely used, on the contrary, most of the time it is not used at all. In order to optimize code efficiency is the most obvious reason, it is understandable to use these tricks in the core and most bottleneck code part, usually do not use it randomly. Grandpa Gao (Donald Knuth, author of the book TAOCP) also has a famous saying, "Premature optimization is the root of all evil" (Premature optimization is the root of all evil). However, research does not have much harm, at least you can open up ideas, exercise your brain, and learn some optimization techniques. Or if you are lucky, you can also cope with a written or interview question.

A trivial solution is as follows (here the input is UINT, the byte version corresponds to the modification type):

?
1
2
3
4
5
6
7
8
9
10
11
12
typedef unsigned int UINT;
UINT reverse_bits (UINT input) {
    const UINT BITS_OF_BYTE = 8; // how many bits per byte
    UINT result = 0; // the result is stored here
    // The following loop processes each bit
    for (UINT i = 0; i <sizeof (input) * BITS_OF_BYTE; i ++) {
        // Take the last bit of the input and add it to the result, the other bits are shifted to the left in turn
        result = (result << 1 ) | (input & 1);
        input >> = 1; // shift right to discard the last digit
    }
    return result;
}
However, this solution is obviously inefficient. First, an N-bit integer needs to be looped N times. In each loop, there are 4 instructions inside the loop body, and there are 2 loop variable modifications and conditional jumps, which is 6N. instruction. (The assignment instruction can be ignored, because these variables do not exceed the register size, can be optimized, and have been stored in the register). The "simple task" of byte-by-bit reverse order requires 48 instructions, which is a bit verbose.

Is there a solution with fewer instructions, of course! But these solutions are not as straightforward and easy to understand as ordinary solutions.

One answer I have read on the Internet is to do this:

?
1
2
3
4
5
6
7
8
9
10
// swap every two
digits v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap the first of every four digits Two digits and last two digits v = ((v
>> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// Swap the first four digits and the last four digits in every eight digits
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// Swap two adjacent bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// The two double bytes before and after the exchange
v = (v >> 16) | (v << 16); The
above code deals with 32-bit integers. If the input is a byte, only three similar lines are needed, as follows:

?
1
2
3
4
5
6
// exchange every two digits
v = ((v >> 1) & 0x55) | ((v & 0x55) << 1); // abcdefgh-> badcfehg
// exchange every four digits The first two digits and the last two digits
v = ((v >> 2) & 0x33) | ((v & 0x33) << 2); // badcfehg-> dcbahgfe
// swap the first four digits and the last four digits
v = (v >> 4) | (v << 4); // dcbahgfe-> hgfedcba
and longer input is of course no problem, this mode can continue to expand, 64-bit, 128-bit ...

The beauty of this code is that if we exchange two bits of a byte through an operation (for example, exchange a and h), other bits are not affected by this operation, so we can naturally consider multiple bits Swap "parallel" operations. So there is the above solution. The central idea is to divide the bits into groups and exchange all two adjacent groups at once. Then, by changing the size of the swap group, each bit finally reaches where it needs to go. The exchange scale of this solution is from small to large, in fact, from large to small, interested students can try it for themselves.

The number of instructions for this packet-switching solution is 5 * log2 (N)-2, which is not an order of magnitude at all than the 6 * N of the ordinary solution. When N = 32, the number of instructions is 23: 192, an increase of more than 8 times, which is already a great improvement. However, programmers who love to be horny are still not satisfied. In the case of N = 8, which means that one byte needs to be reversed, this solution uses 13 instructions. Is there even less?

Please see the following magical solution (using 64-bit arithmetic):

?
1
2
unsigned char b; // The byte to be reversed
b = (b * 0x0202020202ULL & 0x010884422010ULL)% 1023;
Although this solution has been seen repeatedly, it is still deeply shocked by the whimsy contained in it. Actually only used three instructions! Try to explain how this method is done here. First, copy the original bytes into 5 parts by multiplication, and put them in a 64-bit integer end to end; then, use the & operation to take out specific bits. The result of these two operations is that the 8 bits of the original byte are placed in the correct position in the 5 "10-bit groups" ("correct" refers to the position after the reversal). Finally, use a "% 1023" to superimpose these 5 "10-bit groups" and get the final result! See the specific calculation process listed below to understand more clearly:

In order to facilitate reading, the original byte is capitalized, and the "0" in the formula is replaced by the character "." I hope that it will be clearer.
           ...... 1 ....... 1 ....... 1 ....... 1 ....... 1. // 0x0202020202
* ABCDEFGH
----- ----------------------------------------------
           .... ..H ....... H ....... H ....... H ....... H. // There is a 0 on the tail, don't miss it
          ... ... G ....... G ....... G ....... G ....... G.
         ...... F ...... .F ....... F ....... F ....... F.
        ...... E ....... E ....... E ....... E ....... E.
       ...... D ....... D ....... D ....... D .. ..... D.
      ...... C ....... C ....... C ....... C ....... C
     ... ... B ....... B ....... B ....... B ....... B.
    ...... A ...... .A ....... A ....... A ....... A.
----------------------- ----------------------------
    ......ABCDEFGHABCDEFGHABCDEFGHABCDEFGHABCDEFGH.
&   ......1....1...1....1...1....1...1........1....
---------------------------------------------------
    ......A....F...B....G...C....H...D........E.... (*)注↓
%   .....................................1111111111
---------------------------------------------------
    .......................................HGFEDCBA

(*) It ’s
hard to see them together here. We group them out in groups of 10 digits:
    ......... A
    .... F ... B.
    ... G ... C ..
    ..H ... D ...
    ..... E ....
Look, in such a grouping, each bit of the original byte is in the correct position (the highest two bits are zero).
The above calculation process chart source is Log4think, with changes. Thanks to the author Simon for his hard work and meticulous!

So the students who really love you must say that this calculation process is barely understood, but there are still a few problems that have not been explained:

Why copy 5 copies instead of 6 or 4 copies?
Why is there a 0 on the tail?
Why is it superimposed in groups of 10 instead of xx?
Why is the result of computing% 1023 superimposed in groups of 10 bits?
Well, try the answers below:

Why copy 5 copies instead of 6 or 4 copies?
The answer to this question is straightforward: because 4 servings are not enough, you ca n’t do enough to do anything, do n’t believe you to try? And 6 copies are too many, no need.
If you need to prove the above statement, there are later.

Why is there a 0 on the tail?
I guess the author of this solution initially tried to use 0x0101010101 as a multiplier. It's just found that this is equivalent to wasting a copy of the least significant byte (because all 8 bits are left intact, and each bit is not in the correct position), so the multiplier is shifted to the left by one bit, so that the last word A copy of the section can get at least one e in the correct position. In fact, it can only get one bit at most, which is easy to verify.

Why is it superimposed in groups of 10 instead of xx?
First of all, the grouping with less than 8 bits is of course not possible. There is no way to select 8 bits. Grouping by 8 bits is obviously not good, you will find that each group is the same, only the same bit can be selected. So try to select the correct position in groups of 9 digits? You will find that 5 copies are not enough. So the 10-bit group is already the smallest group.
Would a group larger than 10 digits be better? It should be known that no matter whether it is shifted to the left by one bit or multiplied by a few at the beginning, it is clear that the lowest group can only select at most one bit, and the remaining groups can select at most two bits [1], so choose 8 bits require at least 5 groups (strictly speaking, the highest 5th group can be incomplete, so at least 4 groups + 1 bit are required).
Since the 10-bit group is the smallest group and only needs 5 groups of numbers, then this is already optimal.

1. This conclusion can be proved. To put it simply, we have a sequence of reverse order bits (for example 87654321 ...) and a sequence of order bits (for example 12345678 ...), and the length is a group size. The two sequences correspond bit by bit (8-1, 7-2, ...). If the first coincidence occurs at the ordinal number (i, j) (that is, i mod 8 ≡ j), the ordinal number of the subsequent digits (one increase and one decrease) must also be congruent to 8 to be coincident, which is the reverse order i- 4 with positive sequence j + 4, reverse sequence i-8 and positive sequence j + 8, and so on. Note that each group can only select the lower 8 bits for superposition. Obviously, no matter what i, j is, there can only be two (i-4, j + 4) and (i, j) bits. Bits can be selected.
In fact, since there are at least 4 groups + 1 bit, under the 64-bit limit, the maximum is 15-bit grouping. In fact, it is easy to verify that 10-bit grouping and 14-bit grouping are the only two feasible grouping methods.


Why is the result of computing% 1023 superimposed in groups of 10 bits?
This is based on the following principle: the result of% (2N-1) is actually to write this number as a 2N hexadecimal number and then take the sum of the coefficients of each order (strictly speaking, it is only congruence), and write each order of the 2N hexadecimal number The coefficient is the N-bit grouping. Therefore, the result of% (2N-1) is also the result of grouping and superimposing by N bits. In particular,% 1023 is based on 1024 (210) hexadecimal superposition of various coefficients, which is the superposition of 10-bit grouping.
In fact, this principle does not need to be 2N base, we can have a stronger conclusion. For any X system, we have: "any integer N, the sum of the coefficients of each order expanded in X system is congruent with N% (X-1)". Expressed in formula:

Consider the integer coefficient polynomial p (X) = aXn + bXn-1 + ... + z, there is
p (X) mod (X-1) ≡ a + b + ... + z
proves that it is actually very straightforward, suppose Y = X-1, which is substituted into the above formula. The detailed process saves space and I won't write it. Interested students can go here to see it. Thanks to Simon for writing the formula.

In particular, if X = 10, a..z are all integers in the range [0, 9], p (X) is a decimal number written as a degree expansion, so it is easy to get the following quick calculation skills:

a. N mod 9 ≡ (sum of digits of N) sum of digits of mod 9 ≡ (sum of digits of N) mod 9 ... and so on
b. From the previous item, it is immediately available, "N can Divide by 9 "is equivalent to" The sum of the digits of N can be divided by 9 "
c. In particular, since 10 = 32 + 1, the above two quick calculation techniques are also true for 3, for example: multiples of 3 The sum of the numbers is also a multiple of 3.
I believe everyone has learned this in elementary school, is it more familiar than the 1024 hex just now? :)


Finally, the question is answered. Well, once again, we highly recommend what we have just seen, like the "byte-by-bit reverse order" solution-just three instructions. If you have to say that it has any shortcomings, I am afraid it is a division (remainder) and a 64-bit environment.

What if there is no division? What if there are only 32 bits?
Of course, there are other wonderful solutions to meet these conditions. In fact, several of the algorithms in this article are from here (in English), and there are many kinds of tricks about bit manipulation. Interested students can visit by themselves.

http://graphics.stanford.edu/~seander/bithacks.html

 

 

     n = (n & 0x55555555) << 1 | (n & 0xAAAAAAAA) >> 1;
     n = (n & 0x33333333) << 2 | (n & 0xCCCCCCCC) >> 2;
     n = (n & 0x0F0F0F0F) << 4 | (n & 0xF0F0F0F0) >> 4;
     n = (n & 0x00FF00FF) << 8 | (n & 0xFF00FF00) >> 8;
     n = (n & 0x0000FFFF) << 16 | (n & 0xFFFF0000) >> 16;
————————————————
Copyright Notice: This article The original article for CSDN blogger "maojudong" follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.
Original link: https://blog.csdn.net/maojudong/article/details/6235274

Published 54 original articles · Like 89 · Visit 680,000+

Guess you like

Origin blog.csdn.net/ayang1986/article/details/104373026