"Interview" Little Red Book Tour

Author | Existence of L

Source | I am a programmer Xiaojian (ID: Lanj1995Q)

In the previous article, I shared the interview journey at station B. The response from everyone was good, but I was urged to update it, and my hand couldn't help shaking. So I decided to make arrangements for the remaining companies. No, let’s take a look at the server/backstage of Xiaohongshu today. For nothing else, I just want to meet the beautiful HR lady and start work.

Outline

one side

While the interviewer looked at the age of twenty-seven and eighteen, he was very gentle, where is the code, and the hair is floating, okay? I started working on the project. Since everyone’s projects are different, I will talk about the problems that I often encounter during interviews.

  • Describe the item

You can't eat a fat man with one bite. Before describing, hold your breath and measure whether what you are saying can fool yourself, and then fool the interviewer.

  • Role in the project

For most of us, it is the role of development. In the same way, the role corresponds to the corresponding position. Explaining what you do can attract the interviewer to the hook, and hang on the hook for a hundred years.

  • What difficulties encountered in the project

These three questions can be thought of if you can hold your toes, unless you didn't do them, hahahahaha. Don't panic, it's not what we did and we are not afraid. We must know that there is a website called Github. There are so many big cows. We are not big cows. Wouldn't we learn from others? Clone down, set up the environment and run, start debugging and modification, and further modify by splitting the modules. Isn't this your project? Of course, I don't recommend you to do this.

The project was almost asked, and I started to talk about the basic knowledge, the old four basic knowledge, computer network, database, operating system, data structure (come on, always be prepared, really not bragging)

I think your resume says about the restoration of network traffic. You should be familiar with computer networks? (Attention, what is written on your resume, you must have a B number in your heart), then let's talk about computer networks.

  • Talk about the three-way handshake of TCP in computer networks.

First, the Client sends a SYN packet to the Server, the Server receives the SYN and replies with SYN+ACK, and then the client replies with ACK to indicate receipt.

Your answer like this will definitely not satisfy the interviewer, so add some ingredients. How does the dish without seasoning taste good? Then arrange it in detail

First, the client's protocol stack sends a SYN packet to the server, and at the same time tells the server that the serial number currently sent is X, and the client enters the SYNC_SENT state;

After the server's protocol stack receives this packet, it responds with ACK. At this time, the response value is X+1, which indicates the confirmation of SYN packet J. At the same time, the server also sends a SYN packet to tell the client my current serial number. Is Y, the server enters the SYNC_RCVD state at this time;

After the client protocol stack receives the ACK, the application uses the connect call to indicate that the one-way connection of the server is successful, the status is ESTABLISHED at this time, and the client protocol stack responds to the SYN of the server, and the data is Y+1 at this time;

The server receives the response packet from the client and returns through the accept blocking call. At this time, the one-way connection between the server and the client is also successfully established, and the server will enter the ESTABLISHED state.

Isn’t this a bit more B-shaped, and more vivid, of course, in order to deepen everyone’s impression of this process, let me give you another example

The first handshake: Xiaolan confessed to a girl, saying I like you, and then I stupidly waited for a response;

The second handshake: The girl looked at my face and responded in seconds, and naturally agreed to me and replied that I also like you to pull;

Shaking hands for the third time: I received a response from the girl: "I went to eat hot pot, watch a movie, and have physical therapy that night";

It's like this, so what's the follow-up? Do you have to look down and see what waved four times (the slag male stone hammer), no, it's still in love, okay? The interviewer will continue to ask you three handshake.

The interviewer said: "Then I ask you, if the SYN sent by the client is lost or the server cannot process it for other reasons, what is the reason?

This scene is very common and there is nothing foolproof. In the reliable transmission of TCP, if the SYN packet is lost during the transmission, the client segment will trigger the retransmission mechanism, but it is not brainlessly retransmitted. The number of retransmissions is limited. You can pass tcp_syn_retries This configuration item is determined.

If the configuration of tcp_syn_retries is 3 at this time, the process is as follows:

TCP retransmission

After the Client sends the SYN, if it has not received a response from the Server after 1s, the first retransmission is performed. If no response from Sever is received for the second retransmission after 2s, tcp_syn_retries will be retransmitted all the time.

The three retransmissions here means that after the SYN is sent for the first time, it needs to wait (1 + 2 + 4 + 8) seconds. If there is still no response, connect will return via ETIMEOUT error.

Tell me about waving your hands four times, hey, humble blue

Waving for the first time: The girl felt that it was not suitable for this guy, but she was a good person and decided to break up and wait for the guy to respond;

Waving for the second time: This boy, also knows how to play, just say: "Score it";

Waved for the third time: After a while, the boys felt that they were not good enough: "I am a big man, it should be me who proposed to break up", so he said to the girls: Let's break up;

Waved for the fourth time: The girl saw this news, are you "frank" or "mad"?

TIMEWAIT understands, what should I do if there is too much TIMEWAIT, and what causes it?

The way to answer the question is nothing more than what it is, why it appears and what can be solved.

During the four waves of TCP, the party that initiated the disconnection will enter the TIME_WAIT state. Usually a TCP connection provides services through external development of ports. In the case of high concurrency, each connection occupies a port, but the port is so limited that it may cause the port to be exhausted, so there will be ""service sometimes good from time to time" Bad situation".

As shown in the figure below, TCP waved four times. When the TCP connection is ready to terminate, it will send a FIN message. Host 2 enters the CLOSE_WAIT state and sends an ACK response. Host 1 will stay at TIMEWAIT for 2MSL.

Why not directly enter the CLOSE transition state, but need to wait for 2MSL first, what are you doing during this time?

The first reason is to ensure that the final ACK can be received normally, thereby effectively shutting down normally. How to understand it, when scientists design TCP, they assume that the TCP message will go wrong and start retransmission. If the message of host 1 is not transmitted successfully, then host 2 will retransmit the FIN message, and host 1 does not maintain TIME_WAIT at this time. State, the context will be lost and the RST will be restored, causing an error on the side that shuts down the server.

Wave four times

The second reason is to let the repeated subsections of old links disappear naturally in the network.

A network communication may pass through countless routers and switches, and I don't know which link will go wrong. In order to identify a connection, we use a four-tuple method [source IP, source port, destination IP, destination port] . Assume that the two connections A and B at this time.

The A connection is interrupted in the middle. At this time, the B connection is re-created. The quadruple of the B connection is the same as the A connection. If the A connection reaches the destination after a period of time, then the B connection is likely to be considered as part of the A connection , This will cause confusion.

Therefore, TCP sets up such a mechanism so that packets in both directions are discarded.

So what are the harms of TIME_WAIT?

Too many connections will inevitably cause a waste of memory resources!

Occupation of ports. The ports that can be opened are 32768~61000.

Have you optimized TCP?

Just kidding, this thing has been reviewed, just ask, the hammer is not afraid. There are many optimization points, just mention a little, let’s describe the process in a deeper way, such as adjusting which parameters will be optimal under certain conditions

We should all know the semi-connection, that is, the connection that does not reply SYN+ACK after receiving the SYN, then each time the Server receives a new SYN packet, it will create a semi-connection, and then add this semi-connection to the semi-connection queue (syn queue ), the length of the syn queue is limited, which can be configured through tcp_max_syn_backlog. When the number of half connections in the queue exceeds the configured value, the new SYN packet will be discarded.

For the server, there may be a lot of new connections in an instant, so increase this value to prevent the SYN packet from being discarded and the Client not receiving SYN+ACK.

Does this make the interviewer feel that this young man has something. How to configure it?

Configure syn queue

Do you think the interviewer is a fool? Of course not, in case the interviewer asks you: There are more semi-connected backlogs, are there other reasons?

Hahaha, this shows that the interviewer got the bait, come, let's see what else is wrong!

It may also be because a malicious Client is carrying out a SYN Flood attack.

What is the process of SYN Flood attack?

First, the Client sends SYN packets at a higher frequency, and the source IP of this SYN packet is constantly changing. For the Server, this is a new link, and a semi-connection will be allocated to it.

The Server's SYN+ACK will find the IP based on the previous SYN packet, and find that it is not the original IP, so it cannot receive the Client's ACK packet, which leads to the inability to establish a connection correctly. Naturally, the Server's semi-connection queue is exhausted and cannot respond. Normal SYN packet.

Is there any solution to this problem?

That must be, after all, for an interview, we need to ask the interviewer what we know. The SYN Cookies mechanism is introduced in the Linux kernel. What does this mechanism mean?

First, the server receives the SYN packet, does not allocate resources to save the client's information, but calculates the cookie value based on the SYN, then records the cookie in the SYN ACK and sends it out;

If it is normal, the Cookies value will be brought back with the Client's ACK message;

Server will check the legitimacy of the ACK packet based on this Cookie, and create a connection if it is legal;

So how to turn on SYN Cookies?

SYN Cookies

When asked by the Internet, that’s about it, it’s pretty good, and I played the cards exactly according to my routine. Start to fiddle with my operating system!

  • What is large page memory?

I wiped it, I almost didn't react, "Uncle Memory", but it is really awesome, huge page memory, remember, it is huge page memory.

We know that the management of operating system heap memory is managed by multi-level page tables and paging . The default size of each page given by the operating system is 4KB.

Assuming that the memory used by the current process is larger than 1GB, then 1GB/4KB=26211 page table entries will be occupied in the page table at this time, but the page table entries that can be accommodated by the system TLB are far less than this number.

Therefore, when multiple memory-intensive applications access the memory, it will cause too many TLB misses. Therefore, under certain circumstances, it is necessary to reduce the number of misses. A feasible way is to increase the size of each page.

The default large page supported by the operating system is 2MB. When using 1GB of memory, the page table will occupy 512 page table entries, which greatly improves the TLB hit rate and improves performance.

In addition, it should be noted that the large page memory is allocated to physical memory, so there will be no swap out operation, so there is no page fault interruption, and there will be no delay in accessing the disk.

Okay, almost time, write a simple code to achieve the longest substring without repeated characters.

Idea: Use sliding windows to ensure that the letters of each window are unique!

  • Use vectorm to record the new position i should be adjusted to when a letter is repeated

  • So every time you update, j + 1 will be saved, which is the position behind the letter

  • j represents the last letter of the substring, and the length of the substring is calculated as j-i + 1

The longest substring without repeated characters

Two sides

On the one hand, I feel pretty good, and indeed, the other side is here. Miss HR called to inform the two sides on Wednesday, OK, for the warm blue who is never late, she must be punctual. Hold the tea and wait until 2:30. As for why I hold the tea, this is my habit. Drink a cup of tea before the interview and wait for the interviewer's cheers (the interviewers are actually very gentle).

Be patient, the interviewer hasn’t come when I arrive. I can’t wait to call HR. HR said I’m sorry, I have to wait a few minutes, okay, I have to bear this sweet voice, but I haven’t waited for ten minutes. News, I still have a written test in the afternoon, but I have no choice but to tell HR that I have something to do this afternoon, should I change my face?

I don’t know what the situation is, just say, I’ll change the interviewer for you right away, I will wipe it, and there are such things, the baby in my town of Kaka has this kind of treatment? Is my wife outstanding? No, I love Xiaohongshu anyway.

"Staty with me" sounded, this is exactly my phone ringtone. .

"Hello"

"Hello, is it XX?"

"Hmm, hello interviewer"

"I am your second interviewer, introduce yourself first"

My name is XX, I come from XX University, undergraduate XX, master XXX, and did XX during the period, thank you interviewer. You don’t need to introduce yourself so fancy, pick the key point and say, you don’t care about your undergraduate dating a few times, and you don’t care about XXXX, just finish things simple and clear, and start with two sides.

  • You should have learned C. What is the idea of ​​using C to achieve polymorphism?

As for this question, I was quite surprised. Why did I suddenly ask about C? After thinking about it, I might still consider the understanding of polymorphism and inheritance in object-oriented.

Polymorphism is nothing more than compile-time polymorphism and runtime polymorphism. Compile-time polymorphism is understood as overloading, and runtime polymorphism is understood as rewriting. Then to implement overloading, you need to use the macro in c, V_ARGS.

c implement overloading

Understanding the above methods makes it easier to implement polymorphism:

c realize polymorphism

It feels like there is nothing to ask, first write a code and merge the two.

Haha, it reminds me of the lyrics "Come to the left and spend a dragon with me, and draw a rainbow on your right" (brain supplement picture).

stop! ! This is one of the frequently tested algorithms I have mentioned before. The central idea is divide and conquer, which can be split all the time through recursion. The end condition of the recursion is no further division, that is, it stops when it is divided into one.

From the first one, treat each module as a sorted array, like a double pointer, set up pointers at the heads of the two arrays, compare the values, and then insert them into the new array, and add the code!

Merge sort

Do you understand inverted index?

Suppose I have dozens of documents here, and the title of each document is different. If I give you the title of the document, you may find the corresponding document soon. But if I ask you to find the two words "warm" and "blue" in the paper, you may directly give me "liangerba".

Because it is often difficult to find out quickly. From a slightly professional perspective, the former is a forward index, and the latter is an inverted index.

Let's first look at a simple forward index. At this time, give each document a unique ID, and then use the hash table to use the ID of the document as the key and the content of the document as the value corresponding to the key.

In this way, we can complete key retrieval in O(1) time. This is exactly the positive index:

Front Index

Here, the time cost of traversing the hash table is O(n). Every time you traverse a document, you need to traverse each character to determine whether it contains two characters. Assuming that the average length of each document is k, the time cost of traversing a document is O(K).

Is there any optimization method?

In fact, the above are two solutions, one is to find the content based on the topic, and the other is to find the topic based on the keyword. This is the completely opposite plan, so let's try the opposite.

We treat the keyword as the key, and the list of documents containing this keyword as the stored content. Also build a hash table, in O(1) time I can find a list of documents containing the keyword.

This inverted index structure is based on the content or field.

How to create an inverted index?

  • First, number the document to indicate a unique representation, and then sort and traverse the documents;

  • Parse the keywords of each document and generate <keyword, document ID, keyword position>. The keyword position here is mainly to display the information before and after the keyword when searching;

  • Insert the keyword key into the hash table. If the key already exists in the hash table, add a node to the corresponding posting list and record the document ID. If the hash table does not have a response key, insert the key and create a posting list and corresponding node;

  • Repeat steps 2 and 3 to process all documents.

Create an inverted index

What if the query contains both "warm" and "blue" keys?

Follow the vine, use two keys to search in the inverted index, so two different lists are used: A and B. The documents in A all contain the word "warm", and the documents in B all contain the word "blue".

If the document appears both "warm" and "blue", does it just contain two words? So just find the AB public element.

How to find the common elements of the two linked lists of AB? I hope that my friends will think about it and be often asked in the tearing algorithm.

  • First use two pointers P1 and P2 to point to the first element of the ordered linked list AB;

  • Then compare whether the nodes pointed to by the two pointers are the same, there may be three situations;

  • If the two IDs are the same, it is a common element, which can be merged directly, and then P1 P2 is moved backward;

  • The p1 element is smaller than the p2 element, and the descendant of p1 points to the next element in the A linked list;

  • The element p1 is greater than the element p2, and the descendant of p2 points to the next element in the B linked list;

  • Repeat the second step until p1 and p2 move to the end of the linked list.

Linked list common elements

You said that you have used Kafka, so how do you ensure that you only consume it once when using the message queue?

First, message queues such as kafka are introduced to cut peak write traffic and decouple different systems.

For example, we have developed an e-commerce system. One of the functions is to send a red envelope when a user purchases 5 copies of A product to encourage users to consume. But if the message is lost in the process of delivery, the user is likely to be unhappy because he did not receive the red envelope, or even cancel the order. How do you ensure that the message is consumed once?

Let's take a look at how many stages this message is written into the message queue. First, the message is written from the producer to the queue. The message is stored in the message queue. At this stage, the message is consumed by the consumer. Any stage may be lost. We look at these stages separately

Three possibilities of loss

The first stage: message production

The production of messages will usually be the business server. The business server and the independently deployed message queue server communicate through the internal network. It is likely that the message will be lost due to network jitter. In this way, the message retransmission mechanism can be used to ensure the delivery of the message. But it is prone to repeated consumption, meaning that the user is happy after receiving two red envelopes, but...

The second stage: lost in the queue

In order to reduce the random IO of the message storage to the disk, Kafka uses the asynchronous flashing method to store the message in the disk.

I think you have typed acm on your resume. Tell me about your strategy or experience.

Hahaha, finally it's time for me to blow water. Low-key is the most awesome show off!

Write a regular email verification

I didn’t write it out at that time, I really can’t remember it. I checked it every time I used it. Who knows who I met during the interview, tearing up the KMP? Here is an answer for everyone, and I will arrange a regular routine in detail later:

Realize the regularity of verification mailbox

Know the memory map? Talk, try to talk

Since I tried my best, I'm not welcome. From what is memory to how to view server memory, and finally how to make better use of memory to answer the question.

First of all, memory is used to store system and application instructions, data, etc. In Linux, memory mapping is used to manage memory.

The memory capacity that we usually talk about mainly refers to physical memory, also called main memory. Only the kernel can directly access it, so the question is, what should I do if I want to access the memory in the city?

The Linux kernel provides a virtual address space for each process and the space addresses are continuous. In this way, it is very convenient for the process to access the virtual memory.

Virtual addresses are divided into kernel space and user space, and processor address ranges with different word lengths are also different. Let's take a look at the 32-bit and 64-bit virtual address spaces respectively:

Kernel space and user space

It is obvious from this figure that the kernel space of a 32-bit system is 1G, while the 64-bit kernel space and user space are both 128T.

Memory mapping means that the virtual memory address is mapped to the physical memory address. After the mapping is completed successfully, a page table needs to be maintained for each process to record the relationship between the two.

Conversion of virtual address to physical address

In this way, if the virtual address accessed by the process is not present, it enters the kernel space to allocate physical memory through a page fault exception, updates the process page table, and finally returns to the user space.

When it comes to virtual memory, I have to talk about the various segments of user space:

User space segments

Can't help but quietly ask HR. The second interviewer's evaluation of me, the basic and code ability is good, and the project description is not clear.

  • I may not understand the more essential things of the project clearly.

  • The different directions you are engaged in, the understanding of some professional terms is different)


Three sides

Three-faced interviewers, really can't use "bald" to describe, I feel my eyes flashed for a minute, how to say, face!

What are the thread locks? I talked about the read-write lock interrupted me and asked me what problems the read-write lock would have. It was nothing more than the write lock starvation problem. I said that I had not read the kernel source code, and if I let me implement it, How can I avoid it.

Distributed Hash table, when expanding (it will take a long time), I said that P must be guaranteed, CA can only choose one, but we can use weak consistency to ensure its availability.

Multiple random Request requests, and then different requests have different weights, random sampling is performed, and the weight of the request is more likely to be drawn.

Know RPC?

RPC is translated into remote procedure call. Help us shield the details of the network and realize the same experience of calling remote methods as calling local. For example, if there is no bridge, we have to go boating and detours to cross the river. If there is a bridge, we can reach the destination as easily as walking on the road.

What is the communication process of RPC?

I just said that RPC shields the network details, which means that it handles the network part. In order to ensure reliability, it uses TCP transmission by default. The data transmitted by the network is binary, but the parameter requested by the call is the object, so the object needs to be Convert to binary, which requires serialization technology.

After the service provider receives the data, it does not know where the end is, so some boundary conditions are needed to identify where the requested data begins and where it ends, just like various signposts on the highway leading us forward. This format convention is called "agreement."

According to the format specified in the agreement, the corresponding request can be extracted correctly, and the binary message body can be reversely restored to the request object according to the request type and serialization type. This is called deserialization.

The service provider finds the corresponding implementation class through the deserialized object to complete the rectified call, which is an rcp call. Draw a picture to deepen the impression:

RPC process

I felt that I didn’t write this part of the content after I asked some other questions in the previous interview. I also asked a few database questions. It was very regular. I wrote the previous article, which is too long and looks tired. How about three consecutive times, see you next time? muah


to sum up

Please note the following points:

  • The company hires you to work, and will not lower the standard of your requirements because of what you do.

  • The code written on the tool is completely different from the shredded code.

  • Cherish every interview opportunity and learn to review.

  • For fresh graduates, the main investigation is the mastery of basic computer knowledge. The project requirements are not so high. If you do it yourself, you will work hard to dig out the details and do tests. Only in this way can you know what problems you will encounter, what difficulties you will encounter, and how to solve them. . So you can talk freely.

  • Don't be afraid of non-disciplinary classes, you will lose if you are afraid! Be sure to try more.


更多精彩推荐
☞北京 10 年,难说再见!
☞致敬所有的程序员们~ | 每日趣闻
☞腾讯否认微信测试语音消息进度调节;监证会同意蚂蚁集团科创板IPO注册;React 17 正式版发布|极客头条
☞韩辉:国产操作系统的最大难题在于解决“生产关系”
☞蓝色巨人IBM全力奔赴的混合云之旅能顺利吗?
☞区块链赋能供应链金融|应用优势与四类常见模式
点分享点点赞点在看

Guess you like

Origin blog.csdn.net/csdnsevenn/article/details/109233554