[Interview] abused and how only 2GB of memory from two billion, four billion, eight billion integer number found most frequently occur?

Xiaoqiu to interview these days, but recently Xiaoqiu learning a lot and bit algorithm related articles, for example,

[Interview] How to tell if a site is in the number four billion integers?

[Algorithm]-bit computing skills loaded to force Guide

For algorithm problem it is still a little confidence ,,,, so, found the following dialogue.

2 billion level

Interviewer: If I give you 2GB of memory, and give you two billion int type integer, let you to find the largest number of occurrences, how would you do?

Xiaoqiu :( ah? How I feel and that road before a determination on whether there is a number a little different in that 4 billion integers? However, if you still using bitmap algorithm, it seems difficult to count the number of times a number appears, can only judge a number of whether or not there), I can use a hash table to statistics, this number as a key, the number of times this number appears as a value, then I would traverse the hash table which the maximum number of occurrences up to it.

Interviewer: You can count under this method you need, how much memory?

Xiaoqiu: key and value are the integer type int, an int occupies 4B of memory, so a record hash table need to occupy 8B, in the worst case, the number of which 2 billion is a different number, probably take up to 16GB of memory.

Interviewer: Your analysis is correct, but I'll give you only 2GB of memory.

Xiaoqiu :( This question is somewhat similar feeling, but I do not know why, nothing thinking, which under cool), there is no better way.

Interviewer: Do you follow that method, it can only record about different pieces of more than 200 million records, more than 200 million different pieces of records, probably 1.6GB of memory.

Xiaoqiu :( ah? Interviewer said this prompted me?) I'm a little thinking, I can put 2 billion number is stored in different files, and then screened again.

Interview questions: Can you specifically talk about?

Xiaoqiu: Just now you said that my method, only record up to more than 200 million records about different bar, then I can put two billion mapped to the number of different files to, for example, values ​​0-200000000 between 1 stored in the file, the value is located between 200 million to 400 million in the file .... 2, since the int type integer about 4.2 billion different numbers, so I can map them to 21 files to go, as

Obviously, the same number will be in the same file, we can use this time that my method, the highest number of statistics for each file number appears, then the number of those selected from the largest number again, it it.

Interviewer: ah, this method is really good, but if this number 2000000000 I value more concentrated, then, for example, are in the range of 1 to 20 million, then you will put them all mapped to the same file, you have optimization of the Mind?

Xiaoqiu: I can put each number do first hash function mapping , according to the hash value of the hash function get, and then they are stored in the corresponding file, if the hash function to the design is good, then the number it will be distributed more evenly. (For the design of the hash function, I will not say, that I just provide an idea)

4 billion level

Interviewer: What if I put two billion number is added to the number 4 billion it?

Xiaoqiu :( This is not simple, mapped to 42 files chant) I can increase the number of files ah.

Interviewer: What if I gave it number 4 billion in value are the same, then your hash table, the numerical value stored in a key will be 4 billion, but the maximum value is 2.1 billion int or so, then there will be an overflow, how do you do?

Xiaoqiu :( int that I do not have to read long, although it will take up more memory, then I can put files more than several minutes chanting, however, this should not be the interviewer wants the answer), I can the initial value of the value assigned to negative 2.1 billion , so that if the numerical value is 21 million, then, represents a key appeared 4.2 billion times.

Here note, document or 21 is enough, because 21 types of files you can put the value of each file in 200 million kinds of control, that is, records stored in the hash table, or no more than 2 million.

8 billion level

Interviewer: Ha response very quickly, if I put 4 billion to 8 billion it?

Xiaoqiu :( I rely on, this intensified ah) ......... I know, I can again while traversing the judge ah, if I am in the process of statistics, we found a number of key emerging over 40 million times, it would never again have another key times more than it appears, and then I put the key is returned directly to get.

Interviewer: OK, this interview is over, go back to such notice it.

to sum up

Today this passage about the large number of issues related to data processing, the latter may also give you looking for some similar, but different approaches title Le, if you feel good, may wish to:

If you find this quite content to inspire you, to let more people see this article: wish

1, thumbs up , so that more people can see this content (collection point no praise, bullying -_-)

2, pay attention to my column and , let us be a long-term relationship

3, No. public concern " hard to force the code farmers ", mainly to write algorithms, basic computer like the article, which has more than 100 original articles

No public home page
Most of the data structure and algorithm article was reprinted number of various public believe will allow you to gain something

I also share a lot of resources, videos, books, and development tools, to welcome you for your attention, first time to read my article.

Guess you like

Origin www.cnblogs.com/kubidemanong/p/10983251.html