Cloud computing knowledge summary interview questions, interview experience to explain cloud computing

Cloud computing job interview is actually not as complicated as many people think, mainly telephone interview, the interview is estimated that fewer people simply asked some technical problems, there are some aspects of the problem butt in the first round of Commerce asked, Technology when the face, and asked the three levels of cloud computing, cloud computing and now the development of business face when asked how to effectively conduct business docking; the second round, mainly to ask what the project has done, how to do the project, Here to tell you Share a few practical knowledge of cloud computing interview questions.

1, massive log data to extract most of the day to visit a number of Baidu that IP.

IP is 32, there are a maximum of 2 ^ 32 IP. The same mapping method can be used, such as mold 1000, to map the entire large file 1000 small files, and then find the maximum frequency of occurrence of each IP small text (can be used hash_map frequency statistics, and then find the maximum frequency of a few a) and the corresponding frequency. Then in the 1000's largest IP, find the maximum frequency of IP, it is also desired.

2, all of the string search engine will retrieve the log file each time the user retrieval are recorded, the length of each of the query string 1-255 bytes.

Assuming that there are ten million records (repetition of these query string is relatively high, although the total number is 10 million, but if you remove duplicate after no more than 3 million. The higher the repetition of a query string, indicating it queries the more users, which is more popular.), you statistics hottest 10 query string, required memory usage can not exceed 1G.

The first step pretreatment borrow hash statistics: first of these massive data preprocessing (to maintain a Key Query String, Value Query appears that the number of times that Hashmap (Query, Value), each read a Query, if Table is not in the string, then the string is added, and the value is set to 1; if the string in the Table, then a count is added to the string at the end we O (N) (N. for 10 million, because to traverse the entire array again to the number of times each query appears and statistics Department) completed the statistical Hash table with time complexity;

The second step borrow heap sort to find the most popular 10 query string: time complexity of N '* logK. Maintaining a K (which is the title 10) rootlets heap size, then traverse Query 3 million, respectively, and root elements (the value of the comparison value) compared to find the maximum value, value, Query 10

The final time complexity is: O (N) + N '* O (logK), (N 10,000,000, N' 3,000,000)

Or: using a trie, the key field memory string queries arise, 0 is no. Finally, with a minimum of 10 elements to push on the appearance frequency order.

3, has a file size of a 1G, which each row is a word, the word size is not more than 16 bytes, the size of the memory limit is 1M. Returns the highest frequency of 100 words.

Step divide and conquer / hash file mapped to sequential read, for each word x, take the hash (x)% 5000, and 5000 to save the file as a small value (denoted as x0, x1, ... x4999) are . So that each file is probably around 200k. If the file exceeds 1M any size, can continue to divide down in a similar way, the size of small files until decomposition of no more than 1M.

A second step for each small file hash statistics, statistical word frequency of each document and the corresponding occurred (which may employ a trie / hash_map etc.), and remove the maximum occurrence frequency of 100 words (nodes 100 may be used containing the minimum heap), and the 100 words and corresponding frequency stored in the file, so that they get a 5000 document.

The third step heap / merge sort is that this 5000 file merge (can also be used HEAPSORT) of the process. (If the memory can allow all of these elements 5000 file merge, using the heap obtain top 100)

4, given a, b two files, each storing 5 billion url, url each accounted for 64 bytes each, the memory limit is 4G, let you find out a, common url b file?

May estimate the size of each file security 5G × 64 = 320G, far greater than the memory limit of 4G. It is impossible to be completely loaded into memory processing. Consider ways to divide and conquer.

Traverse the file a, is obtained for each url hash (url)% 1000, and based on the values ​​obtained are stored in the url of the small file 1000 (referred to as a0, a1, ..., a999) the. So that each small file of approximately 300M.

Traverse the file b, and taking the same manner as a url are stored in the small files 1000 (denoted as b0, b1, ..., b999). After this treatment, all the same url are possible in a corresponding small files (a0vsb0, a1vsb1, ..., a999vsb999), the small files do not correspond to not have the same url. We then calculated the same as long as 1,000 small file url can be.

Each of the smaller files when evaluated in the same url, which can be stored in a small file url to the hash_set. Then for each url another small files, to see if it is in just built hash_set, if so, is a common url, saved to a file inside it.

  1. Tencent face questions: integer 4 billion to not repeat the unsigned int, no row over the sequence, and then give a number, how to quickly determine the number of whether the number of them that 4 billion?

Scheme 1: Application 512M of memory (2 ^ 32/8 = 512MB), a representative of one bit unsigned int value. Read 4 billion number, set the corresponding bit position, read the number to be queried to see if the corresponding bit to 1, indicates the presence of 1, 0 means does not exist.

Scheme 2: 32 ^ 40 because 2 million, it may be given a number, which may not be; Here we 4000000000 each of the number of 32-bit is assumed to represent a binary number which starts 4000000000 in a file.

Then the number of these 4000000000 into two categories: 1. The most significant bit is the highest bit is a 0 2

And these two types are written to a file, wherein a number of file number <= 2000000000, and the other> = 2000000000 (which is equivalent to the binary); and to find the highest bit number Compare and then enter the appropriate file and then find

And then this document is divided into two categories: 1. the second highest bit is 0 highest bit is 1 2 times

And these two types are written to a file, wherein a number of file number <= 1 billion, and the other> = 1000000000 (which is equivalent to the binary); the maximum number of times to find the bit comparison and then enter the appropriate file and then search. ....... and so on, you can find it, and time complexity is O (logn).

Guess you like

Origin blog.51cto.com/14214237/2407423
Recommended