Foreword
Given a random arrangement of up to four billion 32-bit integers sequential file order, a 32-bit integer not find the file. (In the file did such a number at least - why?). In the case of having enough memory, how to solve this problem? If there are several external "temporary" file is available, but only a few hundred bytes of memory, how to solve this problem?
analysis
This is still a problem, "Programming Pearls" in the. Earlier we mentioned the " Bitmap law ," we use a bitmap method solves this problem. Up to 32-bit integer integer 4294967296, 4000000000 and it is clear that the number of missing at least one bound. We also can also try to use a bitmap method to solve this problem, the use of 536,870,912 bytes, which stores approximately 4 billion memory 512M integers, the position of the integer 1, the last bit traversal, the output of the first bit is 0 location. That means if only a few "temporary" file, in the case of the use of a few hundred bytes of memory how to handle it?
Can I use a binary search it? This 4 billion integers are randomly arranged, so ordinary binary search can not find the number does not exist. But we can based on the idea of binary search.
A 32-bit integer, we are each 0 or 1 bit, to find the range of data into two. From the highest bit to start:
- The most significant bit of 0 is placed in a pile, is a pile on one another
- If as much, is free to choose a pile, for example selected from 0, the bit is 0
- If not the same, less the selected pile continued, such as less 1, the bit is 1
It should be some explanation:
- Since the integer 2 ^ 32, each bit is the number 0 or 1 are the same. If in this 4000000000 integer, a is the number of bits 0 and 1 are the same, the number does not exist on both sides is described. So you can select any of the bunch.
- If the multi-bit integer integer than 0, then, is the number of bits in a bunch of 0's definitely lacks some numbers. And as the number of bits in a pile of 1, you may be missing some of the numbers. Therefore, we chose less, that is, the number of bits is that a bunch of 0's.
- Every choice, both record selection is 0 or 1, up to 32 times after the selection, you can find at least one integer, which does not exist in the number 4 billion.
Example shows
Since many of the 32-bit integer data amount, the inconvenience described, we used a 4-bit data of the above described ideas do a. 4 up to 16-bit number.
Consider the following data sources:
|
|
It corresponds to the following binary form (negative numbers stored in memory complement form):
|
|
1. Processing of the first bit data is divided into two parts, namely:
- Bit is 0
|
|
- Bit 1 of
|
|
Can be seen that the first bit is a number from 1 to 5, less than the number of 0 bits are therefore selected bit is a number of 1, processing continues. And the first bit to obtain 1 .
3. Treatment 2 bits of data is still divided into two parts, namely:
- Bit is 0
|
|
- Bit 1 of
|
|
Can be seen that the first bit is a number from 1 to 3, the ratio of the number of bits to 0 to more, so the number of selected bits are 0, the process continues. And the second bit to obtain 0 .
2. Treatment of 3 bit data is still divided into two parts, namely:
- Bit is 0
|
|
- Bit 1 of
|
|
Apparent third bit number is not 0, thus selecting bit 0, to obtain 0 . At this point, it has no need to continue to find a.
We finally obtain the first three bits 100, and therefore these numbers do not exist, at least 1000, 1001, i.e. -8, -7.
Code
C language:
|
|
Code Description:
- Here splitByBit function based on the data bits into two parts
- closeAllFile for closing the file descriptor
- findNum function loops 32 bits, the processing time for each bit to obtain a final can be an integer which does not exist.
The use of scripting produced about 20 million integers:
|
|
Compile and run:
|
|
The main program of the time spent reading and writing files, and take up minimal memory.
to sum up
This article from a particular point of view with the most common binary search to solve the problem, up to 32 split times, you can find the integer that does not exist. Do you have any better ideas or optimization points, welcome message.
Original: Large column to find a non-existent from 4 billion integers