Interview question for a certain goose: There are 250 million numbers, of which only one number appears twice, and the other numbers appear once. How to find out this number of repetitions when the memory is tight?

Problem & Analysis:
1. 2.5E number: Feature>Huge amount of data
2. Memory tight: Feature>1. Consider using time for memory,
------------------- ---- 2. Consider swapping disk space for memory space
-----------------------3. The memory is tight, and I don't know if the available space is 1G or 10M …
Thinking: divide and conquer (grouping processing, grouping incoming)
here can be divided into two ideas.
Idea 1:
split into multiple ordered files ()
we ascending (file name ascending, file data ascending) equal spacing/equal interval Range storage, assuming that each group of 100W/1000W data,
before that we can use concurrent stream sorting to speed up the sorting rate,
here we use comparator sorting.
When the comparison value = 0, it proves that we find the duplicate value and assign it directly to the AtomicInteger variable , And return to end the search.
If there is no duplicate value in the group, save and regroup to a new file (file name+1).
Finally, take out two file data each time, first compare the maximum and minimum values ​​respectively, and get the repeated result or offset value , Perform a binary search on the intermediate value, until a duplicate value is found, and the next two values ​​can
not be found. Continue to compare the next two values .

Guess you like

Origin blog.csdn.net/weixin_43158695/article/details/113406009