2020 Meituan Autumn Recruitment C++ Selected Interview Questions and Answers (Part 2)

  1. Bloom filter algorithm
    Bloom filter uses binary vector combined with hash function to record whether any piece of data already exists in the set.
    The execution process of the Bloom filter is:
    first apply for a Bit set containing SIZE bits, and set all Bits to 0.
    Then use several (k) different hash functions to hash the target data and get k hash values ​​(ensure that the hash value does not exceed the size of SIZE), and then use the hash value in the Bit set as the subscript The bit value at is set to 1. Since k hash functions are used, the information for recording a piece of data will set the k bit values ​​to 1 in the Bit set.
    Due to the stability of the hash function, the k bit positions corresponding to any two pieces of the same data in the Bit set are exactly the same. Then when detecting whether a piece of data has been recorded in the Bit set, it is only necessary to check whether the k hash values ​​of the piece of data in the corresponding position in the Bit set have been marked as 1, on the contrary, as long as it exists If the bit position corresponding to a hash value is not marked as 1, it proves that the value has not been recorded.
    Example of use An example of a
    Bloom filter is as follows:

Insert picture description here

The general process is:
first initialize a binary array, the length is set to L (8 in the figure), and the initial value is all 0.
When writing a data with A1=1000, it needs to perform H hash function calculations (here, 2 times); similar to HashMap, the calculated HashCode and L are modulated and then located at 0 and 2, this place The value of is set to 1.
A2=2000 is the same as setting 4 and 7 positions to 1.
When there is a B1=1000 that needs to be judged whether it exists, it also does two Hash operations and locates at 0 and 2. At this time, their values ​​are all 1, so it is considered that B1=1000 exists in the set.
When there is a B2=3000, the same is true. When the first Hash is located at index=4, the value in the array is 1, so the second Hash operation is performed, and the result is located at index=5, the value is 0, so it is considered that B2=3000 does not exist in the set. Advantages
and disadvantages
Advantages The
time complexity is O(n), and the Bloom filter does not need to store the element itself, uses a bit array, and occupies a small space.
Disadvantages
Through bloom filtering, we can accurately judge that a number does not exist in a certain set, but for the conclusion that it exists in the set, bloom filtering will have false positives (there may be two sets of different data but multiple hash values Exactly the same situation). However, by controlling the size of the Bit set (ie SIZE) and the number of hash functions, the probability of conflict can be controlled within a very small range, or the hash conflict problem can be completely resolved by creating an additional whitelist.
The formula for calculating the misjudgment rate is (1-e(-nk/SIZE))k, where n is the number of target data, SIZE is the size of the Bit set, and k is the number of hash functions used; assuming there are 10 million For the data to be processed, the Bit set size is 2^30 (about 1 billion, which occupies 128MB of memory). Using 9 different hash functions, the calculated hash values ​​of any two pieces of data are the same for 9 times. (Regardless of order), the probability is 2.6e-10, which is about one in 3.8 billion.
Insert picture description here
Insert picture description here
More interview questions and video materials of the first-line Internet companies, vx pay attention to Zero Sound Academy, get free!
Insert picture description here

2. The process of a C++ source file from text to executable file

?
For C++ source files, four processes are generally required from text to executable file:
Preprocessing stage: analysis and replacement of file inclusion relationships (header files) and precompiled statements (macro definitions) in the source code files to generate precompiled files .
Compilation stage: Convert pre-compiled files after preprocessing into specific assembly code, and generate assembly files.
Assembly stage: Convert the assembly files generated in the compilation stage into machine code, and generate relocatable object files.
Link stage: Multiple object files And the required libraries are linked into the final executable object file

3. What api related to shared memory?

Linux allows different processes to access the same logical memory and provides a set of APIs. The header file is in sys/shm.h.
1) Create a new shared memory shmget
int shmget(key_t key, size_t size, int shmflg);
key: Shared memory key value, which can be understood as a unique mark of shared memory.
size: shared memory size
shmflag: the read and write permission identification of the creating process and other processes.
Return value: corresponding shared memory identifier, return -1 on failure
2) Connect shared memory to the address space of the current process shmat
void *shmat(int shm_id,const void *shm_addr,int shmflg);
shm_id: shared memory identifier
shm_addr: Specify the address where the shared memory is connected to the current process, usually 0, which means that the system chooses.
shmflg: Flag bit
Return value: Pointer to the first byte of shared memory, return -1 on failure
3) The current process separates shared memory shmdt
int shmdt(const void *shmaddr);
4) Controls shared memory shmctl
and semaphore semctl The function is similar, controlling shared memory
int shmctl(int shm_id,int command,struct shmid_ds *buf);
shm_id: shared memory identifier
command: there are three values
IPC_STAT: get the status of shared memory, copy the shmid_ds structure of shared memory to buf .
IPC_SET: Set the state of shared memory, copy buf to shmid_ds structure of shared memory.
IPC_RMID: delete shared memory
buf: shared memory management structure.

4. What is the composition of the reactor model.

The reactor model requires that the main thread is only responsible for monitoring whether an event occurs on the file description, and if so, it will immediately notify the worker thread of the event. In addition, the main thread does not do any other substantive work, read and write data, and accept new information. The connection and processing of client requests are completed in the worker thread. The model is composed as follows:
Insert picture description here

1) Handle: The handle in the operating system, which is an abstraction of resources at the operating system level. It can be an open file, a connection (Socket), Timer, etc. Since the Reactor mode is generally used in network programming, it generally refers to Socket Handle, which is a network connection.
2) Synchronous Event Demultiplexer (synchronous event multiplexer): Blocking and waiting for the arrival of a series of events in the Handle. If blocking and waiting for the return, it means that the returned event type can be executed without blocking in the returned Handle. This module is generally implemented using the select of the operating system.
3) Initiation Dispatcher: Used to manage Event Handler, the container of EventHandler, used to register, remove EventHandler, etc.; in addition, it also serves as the entrance of Reactor mode to call the select method of Synchronous Event Demultiplexer to block waiting for the event to return, and when blocked waiting When it returns, it will be distributed to the corresponding Event Handler according to the Handle of the event, that is, the handle_event() method in the EventHandler will be called back.
4) Event Handler: Define the event processing method: handle_event() for the callback of the InitiationDispatcher.
5) Concrete Event Handler: Event EventHandler interface, which implements specific event processing logic.
5. What context does the thread need to save? What are the functions of the SP, PC, EAX registers? The
thread needs to save the current thread ID, thread status, stack, register status and other information during the switching process. The registers mainly include SP PC EAX and other registers, and their main functions are as follows:
SP: stack pointer, pointing to the top address of the current stack
PC: Program counter, which stores the next instruction to be executed
EAX: Accumulation register, the default register for addition and multiplication

5. There are 10 million short messages, with repetitions, saved in the form of text files, one line per line. Please take 5 minutes to find the top 10 recurring items.

#include<iostream>
#include<map>
#include<iterator>
#include<stdio.h>
using namespace std;
 
#define HASH __gnu_cxx
#include<ext/hash_map>
#define uint32_t unsigned int
#define uint64_t unsigned long int
struct StrHash
{
    
    
 uint64_t operator()(const std::string& str) const
 {
    
    
 uint32_t b = 378551;
 uint32_t a = 63689;
  uint64_t hash = 0;
 
 for(size_t i = 0; i < str.size(); i++)
 {
    
    
 hash = hash * a + str[i];
 a = a * b;
 }
 
 return hash;
 }
 uint64_t operator()(const std::string& str, uint32_t field) const
 {
    
    
 uint32_t b = 378551;
 uint32_t a = 63689;
 uint64_t hash = 0;
 for(size_t i = 0; i < str.size(); i++)
 {
    
    
 hash = hash * a + str[i];
 a  = a * b;
 }
 hash = (hash<<8)+field;
 return hash;
 }
};
struct NameNum{
    
    
 string name;
 int num;
 NameNum():num(0),name(""){
    
    }
};
int main()
{
    
    
 HASH::hash_map< string, int, StrHash > names;
 HASH::hash_map< string, int, StrHash >::iterator it;
 NameNum namenum[10];
 string l = "";
 while(getline(cin, l))
 {
    
    
 it = names.find(l);
 if(it != names.end())
 {
    
    
 names[l] ++;
 }
 else
 {
    
    
 names[l] = 1;
 names[l] = 1;
 }
 }
 int i = 0;
 int max = 1;
 int min = 1;
 int minpos = 0;
 for(it = names.begin(); it != names.end(); ++ it)
 {
    
    
 if(i < 10)
 {
    
    
 namenum[i].name = it->first;
 namenum[i].num = it->second;
  if(it->second > max)
 max = it->second;
 else if(it->second < min)
 {
    
    
 min = it->second;
 minpos = i;
  }
 }
 else
 {
    
    
 if(it->second > min)
 {
    
    
 namenum[minpos].name = it->first;
 namenum[minpos].num = it->second;
 int k = 1;
 min = namenum[0].num;
 minpos = 0;
 while(k < 10)
  {
    
    
 if(namenum[k].num < min)
 {
    
    
 min = namenum[k].num;
 minpos = k;
 }
 k ++;
 }
 }
 }
 i++;
 
 }
 i = 0;
 cout << "maxlength (string,num): " << endl;
 while( i < 10)
 {
    
    
 cout << "(" << namenum[i].name.c_str() << "," 
<< namenum[i].num << ")" << endl;
 i++;
 }
 return 0;
}

6. May I ask the principle of malloc, what are the functions of the brk system call and the mmap system call?

The malloc function is used to dynamically allocate memory. In order to reduce the overhead of memory fragmentation and system calls, malloc uses a memory pool method, first applying for a large block of memory as a heap area, and then dividing the heap into multiple memory blocks, using the block as the basic unit of memory management. When the user applies for memory, a suitable free block is directly allocated from the heap area. malloc uses an implicit linked list structure to divide the heap into continuous blocks of different sizes, including allocated blocks and unallocated blocks; at the same time, malloc uses an explicit linked list structure to manage all free blocks, that is, use a doubly linked list to connect free blocks Together, each free block records a continuous, unallocated address.
When memory is allocated, malloc will traverse all free blocks through the implicit linked list, and select the blocks that meet the requirements for allocation; when memory is merged, malloc uses the boundary notation method to determine whether the blocks before and after each block have been allocated. Whether to perform block merging.

When malloc applies for memory, it usually applies through brk or mmap system calls. When the requested memory is less than 128K, the system function brk will be used to allocate in the heap area; when the requested memory is greater than 128K, the system function mmap will be used to allocate in the mapping area.

7. May I ask the Linux virtual address space

In order to prevent different processes from competing and trampling on physical memory while running in physical memory at the same time, virtual memory is used.
The virtual memory technology makes different processes in the running process, what it sees is that they alone occupy the 4G memory of the current system. All processes share the same physical memory, and each process only maps and stores the virtual memory space it currently needs to the physical memory. In fact, when each process is created and loaded, the kernel just "creates" the virtual memory layout for the process. Specifically, it initializes the memory-related linked list in the process control table. In fact, it does not immediately put the program data corresponding to the virtual memory location. And the code (such as the .text.data section) is copied to the physical memory, just to establish the mapping between the virtual memory and the disk file (called memory mapping), wait until the corresponding program is run, the page fault exception will be passed To copy the data. Also, during the process of running, it is necessary to dynamically allocate memory. For example, when malloc, only virtual memory is allocated, that is, the page table entry corresponding to this virtual memory is set accordingly. When the process actually accesses this data, the defect is caused. The page is abnormal.

The request paging system, the request segmentation system, and the request segment paging system are all aimed at virtual memory, and information replacement between memory and external memory is realized through request.
The benefits of virtual memory:
1. Enlarge the address space;
2. Memory protection: Each process runs in its own virtual memory address space and cannot interfere with each other. Virtual memory also provides write protection to specific memory addresses to prevent malicious tampering of code or data.
3. Fair memory allocation. After using virtual storage, each process is equivalent to the same size of virtual storage space.
4. When the process communicates, it can be realized by virtual memory sharing.
5. When different processes use the same code, such as the code in the library file, only one copy of this code can be stored in the physical memory. Different processes only need to map their own virtual memory to save memory
6 .Virtual memory is very suitable for use in multi-programming systems, many program fragments are stored in the memory at the same time. When a program waits for part of it to be read into memory, it can give the CPU to another process for use. Multiple processes can be kept in the memory, and the system concurrency is improved.
7. When the program needs to allocate continuous memory space, it only needs to allocate continuous space in the virtual memory space, and does not need the continuous space of the actual physical memory. Fragmentation can be used

The cost of virtual memory:
1. The management of virtual memory requires the establishment of many data structures, which take up additional memory
. 2. The conversion of virtual addresses to physical addresses increases the execution time of instructions.
3. The page swapping in and out requires disk I/O, which is very time-consuming.
4. If there is only part of the data in a page, it will waste memory.

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/108603022