Convinced face-to-face --- cloud computing direction (with analysis of problem knowledge points)

1. Overview of one side of the problem

1. Let me introduce myself first.
2. Do you know the go language?
3. Which project has the greatest sense of accomplishment?
4. After the introduction of the protocol header, how much has the size of the packet increased (according to the project)?
5. Redis cache problem, how to build a cache (asked according to the project)?
6. What is the cache middleware update mechanism (asked according to the project)?
7. Why is it not written but deleted when updating data (asked according to the project)?
8. Redis caches hotspot data. Under what scenarios will dirty data be generated?
9. In CRUD, does your background service process the database first or the cache (redis) first? How are these four operations handled?
10. In the cache solution, first delete the cached data, then update the database, and finally synchronize (write). What problems will this method cause under high concurrency? Will data consistency be a problem? (Or what happens when concurrent reads and concurrent writes occur at the same time?)
11. Do you know other cache update modes?
12. How to detect memory leaks?
13. If the handle of a service (file handle or socket handle) has been increasing without decreasing, how to analyze it? what is the problem?
14. What is the command for the system to reduce the number of large handles? ulimit.
15. There are many socket handles, how to analyze and solve them? Under what circumstances is this phenomenon normal? Under what circumstances is it abnormal?
16. There are many services and many interfaces. Now I don’t know where the memory or socket handle is leaking. How to analyze it?
17. What is zero copy?
18. Which technologies can achieve zero copy?
19. What is Direct IO?
20. On the operating system, talk about the relationship between coroutines, threads, and processes? What is the difference in resources?
21. What is the difference in resources between parent and child processes? (It can be explained from the copy-on-write of fork)
22. Why doesn't the bottom layer use a height-balanced binary tree for the map data structure in C++? (https://blog.csdn.net/weixin_61207303/article/details/123402294)
23. How do you understand program development on windows?
24. What is the most troublesome problem in window development client?
25. What are the windows user interface libraries? (qt, duilibUI)
26. What is the value of the message queue? What problem does it mainly solve?
27. What problems will the introduction of message queues bring?
28. Talk about the advantages and disadvantages of message queues

2. Practical related

1. For google programming, for stack overflow programming, draw a flow chart.

1) What should I do if Google cannot find the code?
2) What should I do if the final verification of the code found by Google is not what I want or appropriate (verification failed)?

2. Algorithm question, give an array root[]={6,3,5,3,6,7,8,-1,-1,5,6}, build a binary tree, and then traverse the output binary tree in order.

6
3
5
3
6
7
8
5
6

summary:

  1. The way of answering needs to be improved, and the narration technology needs to be expanded. Do not ask and answer one by one (you should expand the narration on the knowledge points asked).
  2. Answers need to be clear and cohesive, with as few pauses as possible.
  3. Speak calmly and logically.

3. Review and sort out the answers to the questions (check for omissions and fill in vacancies)

3.1. Simple understanding of go language

Dachang’s back-end high-concurrency programs/services basically introduce the golang language for development, so you need to understand the golang language.

Golang was born for high concurrency and was officially released by Google in 2009. Golang supports concurrency at the language level, and realizes concurrent program running through lightweight coroutine goroutine.
The golang language is a compiled language launched by Google with strong typing, native support for concurrency, and garbage collection (GC) capabilities. Go language syntax is similar to C, but it has functions: memory safety, GC (garbage collection), structural shape and CSP-style concurrent computing.

The lightness of golang is mainly reflected in:

  1. Context switching is cheap. The goroutine context switch only involves the value modification of three registers (count register PC, stack pointer register SP, data register DX), while the context switch of the comparison thread needs to involve mode switching (switching from user mode to kernel mode), and more than 16 Refresh of registers.
  2. Small memory footprint. The thread stack space is 2M, and the minimum stack space of goroutine is only 2K.

The Golang program can easily support 10w-level Goroutine operation, and when the number of threads reaches 1k, the memory usage has reached 2G.

The go scheduler model is usually called the GPM model, which includes four important structures: G, P, M, and Sched. The go program schedules the execution of goroutine on the kernel thread through the scheduler, but G (goroutine) is not directly bound to the OS thread M (Machine) to run, but is run by P (processor, logical processor) in the goroutine scheduler. Intermediary for obtaining kernel thread resources.
There are two different run queues in the Go scheduler: the global run queue (GRQ) and the local run queue (LRQ); each P has an LRQ for managing the Goroutines assigned to execute in the context of the P, these Goroutines In turn being context switched by M bound to P. GRQ is for Goroutines not yet assigned to P.
GPM model

(1) G: Goroutine, each Goroutine corresponds to a G structure, and G stores the running stack, state and task function of Goroutine , which can be reused. G is not an executive body, each G needs to be bound to P to be scheduled for execution .
(2) P: Processor, indicating a logical processor. For G, P is equivalent to a CPU core, and G can only be scheduled if it is bound to P. For M, P provides related execution environment (Context), such as memory allocation status, task queue, etc. The number of P determines the maximum number of parallel G in the system (premise: the number of physical CPU cores >= the number of P). The number of P is determined by the GoMAXPROCS set by the user, but no matter how large the GoMAXPROCS is set, the maximum number of P is 256.
(3) M: Machine, the OS kernel thread abstraction, represents the resources that actually perform calculations. After binding a valid P, it enters the schedule loop; and the mechanism of the schedule loop is roughly from the Global queue, the Local queue of P, and the wait queue obtained from . The number of M is variable and is adjusted by the Go Runtime. In order to prevent too many OS threads from being created and cause the system to fail to schedule, the default maximum number is currently 10,000. M does not retain the state of G, which is the basis for G to be dispatched across M.
(4) Sched: Go scheduler, which maintains queues for storing M and G and some status information of the scheduler. The mechanism of the scheduler cycle is roughly to obtain G from various queues and P's local queue, switch to G's execution stack and execute G's function, call Go exit to clean up and return to M, and so on.

Go programs can use a small number of kernel-level threads to support the concurrency of a large number of Goroutines. Multiple Goroutines share the computing resources of the kernel thread M through user-level context switching , but there is no performance loss caused by thread context switching for the operating system.

In order to make full use of the computing resources of threads, the Go scheduler adopts the following scheduling strategies:
(1) Task stealing (work-stealing). When the G tasks among each P are unbalanced, the scheduler allows G to be executed from GRQ, or other P's LRQ.

(2) Reduce blocking . Blocking in Go is mainly divided into the following four scenarios:

  1. Due to atomic, mutex or channel operation calls causing Goroutine to block, the scheduler will switch out the currently blocked Goroutine and reschedule other Goroutines on LRQ.
  2. Because network requests and IO operations cause Goroutine to block, the Go program provides a network poller (NetPoller) to handle network requests and IO operations, and its background implements IO multiplexing. By using NetPoller to make network system calls, the scheduler prevents Goroutines from blocking M while making these system calls. This allows M to execute other Goroutines in P's LRQ without creating new Ms; helping to reduce the scheduling load on the operating system. A simple network programming model of goroutine-per-connection is realized (but a large number of Goroutines will also bring additional problems, such as increased stack memory and increased burden on the scheduler).
    net poller 1
    net poller 2
    net poller 3
  3. When calling some system methods, if the system method calls are blocked, in this case, the network poller (NetPoller) cannot be used, and the Goroutine that makes the system call will block the current M. The scheduler separates M1 from P and also takes G1 away. The scheduler then introduces a new M2 to serve P. At this point, G2 can be selected from LRQ and context switched on M2. After the blocking syscall completes, G1 can move back to LRQ and be executed by P again. If this happens again, the M1 will be set aside for future reuse.
    insert image description here
    insert image description here
  4. If a sleep operation is performed in Goroutine, M will be blocked. There is a monitoring thread sysmon in the background of the Go program, which monitors those long-running G tasks and then sets an identifier that can be seized, so that other Goroutines can preemptively execute them. As long as this Goroutine makes a function call next time, it will be seized and the site will be protected, and then put back into the local queue of P to wait for the next execution.

3.2. The place with the greatest sense of accomplishment or the greatest challenge in the project

Projects without challenges can consider the risks of the technology stack used in the project, and then provide a perfect solution for the existing risks.
Those who have good projects can tell from the level of system design.

For example, in my project, to design and develop an efficient background server program in the local area network, the greatest achievement is to design an efficient network transmission protocol to provide efficient responses for centralized concurrency (or, the biggest challenge is how to design an efficient Network model and communication protocol to achieve high concurrent response). In the project, the client only knows the port opened by the server. In the first version, in order to ensure the reliability of the data, TCP is mainly used to transmit data, but because it does not know the IP of the server , so first use UDP to send broadcasts to find the server, and then switch to TCP to transmit data after obtaining the server IP; after verification, most of the transmitted data is relatively small, and the probability of UDP data loss in the LAN is extremely low, so it is converted to UDP mainly responds, because UDP has better real-time performance, and its work efficiency is higher than that of TCP, and its response efficiency and data transmission efficiency are doubled.

This process not only reflects your ability to troubleshoot and solve problems, but also highlights your precipitation and self-thinking ability.

3.3. Project problem - after the introduction of the protocol header, how much does the size of the package increase?

In my project, a protocol header like this is introduced:

 1 byte    1 byte                        6 bytes
+--------+--------+-----------------------------------------------------+
| opcode | pl_len |        src_ip                                       |
+-----------------------------------------------------------------------+
|                          src_ip                                       |
+-----------------+-----------------------------------------------------+
|    src_ip       |       src_mac                                       |
+-----------------+-----------------+-----------------------------------+
|    src_mac      |       src_port  |             des_ip                |
+-----------------------------------------------------------------------+
|                          des_ip                                       |
+-----------------------------------+-----------------------------------+
|    des_ip                         |            des_mac                |
+-----------------------------------+-----------------+-----------------+
|    des_mac                        |      des_port   |     preload     |
+-----------------------------------------------------------------------+
|                              ...                                      |
|                            preload                                    |
|                              ...                                      |
+-----------------------------------------------------------------------+

7*8-2=54 bytes, so the maximum is 54 bytes more, and the minimum is 22 bytes (only the header, MAC address and port). One byte contains the possible fragmentation reserved bits.

3.4. How to create a cache

Use the redis nosql database as the cache of the Mysql database. When searching, first search the redis cache, and return the result if found; if not found in redis, then search the Mysql database, return the result and update redis if found; if not found returns empty. For writing, directly write to the mysql database, and the mysql database automatically updates the changed content to redis through the trigger and UDF mechanism.

server
read
找到
没有找到返回空
找到就返回数据
找到就同时更新redis
检索
没有找到
检索
是否找到
redis
MySQL
是否找到
client
server
write
redis
触发器+udf、kafka、cannal、go-mysql-transfer 等方式更新缓存
MySQL
client

3.5, cache middleware update mechanism

Synchronization schemes can have:
(1) Masquerade from the database. For example, Ali's open source canal solution, kafka, go-mysql-transfer, etc.
(2) MySQL trigger + udf. The full name of udf is User-defined function, which is an extensible code provided by MySQL. UDF does not have transactions and cannot be rolled back; and it is less efficient.

3.6, write strategy of redis cache

以MySQL为主,保证缓存不可用,整个系统依然要保持正常工作;mysql 不可用的话,系统停摆,停止对外提供服务。
(1)从安全优先方面考虑;先删除缓存,再写 mysql,后面数据同步交由 go-mysql-transfer 等中间件处理。先删除缓存,为了避免其他服务读取旧的数据;也是告知系统这个数据已经不是最新,建议从 mysql 获取数据。
(2)从效率优先方面考虑;先写缓存,并设置过期时间(如 200ms),再写mysql,后面数据同步交由其他中间件处理。这里设置的过期时间是预估时间,大致上是 mysql 到缓存同步的时间。在写的过程中如果 mysql 停止服务,或数据没写入 mysql,则200 ms 内提供了脏数据服务;但仅仅只有 200ms 的数据错乱,即效率优先的写策略也有安全性的问题,但只会影响200ms。

3.7、redis缓存热点数据,在什么场景下会产生脏数据

所谓产生脏数据,就是主数据库与缓存数据库的数据不一致,这种情况一般在效率优先的写策略中会产生,但因为设置了过期时间,所以只会影响一定时间。

3.8、在CRUD中如果操作主数据库和缓存数据库

(1)C,创建一条数据:从安全优先考虑,先写入MySQL,然后通过中间件go-mysql-transfer更新缓存数据库redis;从效率优先考虑,先写入redis并设置过期时间,再写入mysql。
(2)R,查询数据:先查询redis,如果有则直接返回,如果没有再访问mysql查询,mysql中也没有就返回空,mysql中有就返回数据,并通过中间键更新redis缓存。
(3)U,更新数据:先删除redis的数据,再写数据库,然后通过中间件go-mysql-transfer更新缓存数据库redis。
(4)D,删除数据:先删除redis的数据,再删除mysql的数据。

3.9、内存泄漏怎么检测?

debug版本,通过VLD库或CRT库分析内存泄漏,但是服务器有很多问题需要在线上并发压力情况下才出现,因此讨论Debug版调试方法意义不大。

对于release版本:
(1)静态方式检测。使用cppcheck、BEAM等工具检测代码文件是否有内存泄漏隐患。
(2)入侵式检测。实现内存泄漏检测组件,嵌入代码中,比如hook住分配内存的函数、mtrace追踪内存分配、对象计数方式;C++还可以重载内存分配和释放函数 new 和 delete。这种方式的缺点是要修改原代码,而且对于第三方库、STL容器、脚本泄漏等因无法修改代码而无法定位。
(3)已经上线的服务器程序,可以使用Valgrind、AddressSanitizer等工具定位到具体的内存泄漏位置。

3.10、socket句柄比较多,怎么分析和解决?

(1)查看当前socket使用状态。

# 方法一
cat /proc/net/sockstat
# 方法二
ss -s

(2)如果是FIN_WAIT和TIME_WAIT的,可以调整fin超时时间或者其他内核参数来解决。
(3)如果是出现大量CLOSING状态,基本上业务上要处理的逻辑过多,导致一直在CLOSING状态;可以使用异步,将网络层和业务层分离,单独处理。

3.11、什么是零拷贝?

零拷贝技术就是绕过用户态,用户缓存区不参与数据拷贝;避免了内核态和用户态的冗余数据拷贝,减少上下文切换。
linux系统提供两个零拷贝接口:mmap和sendfile:

  1. mmap通过DMA将磁盘数据拷贝到内核缓冲区,接着操作系统会把这段内核缓冲区与应用程序共享,这样就不用将内核缓冲区的数据拷贝到用户缓冲区,应用程序调用send或者write时,系统直接将内核缓冲区的内容拷贝到socket缓冲区,然后发送出去。使用mmap至少减少了一次数据拷贝,但是仍然会有用户空间和内核空间的上下文切换。
  2. sendfile的数据传输只发生在内核态,应用程序调用sendfile,系统会将数据拷贝到page cache,然后拷贝到socket buffer,再通过网卡发送出去。这种发送不仅减少了数据拷贝的次数,还减少了上下文切换。如果网卡支持SG-DMA,还可以再去除 Socket 缓冲区的拷贝,这样一共只有 2 次内存拷贝。

此外,还有一个direct io技术,绕过内核缓冲区,在用户空间和磁盘之间传输数据,减少内核缓冲区和用户数据的复制次数。降低了文件读写所带来的CPU负载能力和内存带宽的占用率。

零拷贝
标准IO
数据将缓存在page cache中
读:数据首先被复制到内核缓冲区(page cache),之后再从内核缓冲区复制到用户缓冲区中
写:用户缓冲区的内容首先被复制到内核缓冲区中,再被写到磁盘或通过socket发送到网络
DMA
外部设备与内存数据交互优化
一种高速的数据传输操作,允许外部设备和内存之间之间读写数据,既不通过CPU,也不需要CPU干预
DMA是指外部设备不通过CPU而之间与系统内存交换数据的接口技术
传输速度取决于存储器和外设的工作速度
Direct IO
避开内核缓冲区,在用户空间和磁盘之间传输数据
场景
应用程序具有自己用户空间的缓存机制
高并发环境中大文件的传输
缺点
不经过内核缓冲区直接进行磁盘读写操作,必然会引起阻塞,需要配合异步IO协调使用
无法享受page cache带来的性能提升
定义
绕过用户态,用户缓冲区不参与数据拷贝
避免内核缓冲区和用户缓冲区的冗余数据拷贝,减少了上下文切换
减少一次copy
mmap内存地址映射
sendfile系统调用
减少两次copy:网卡支持SG-DMA,可去除socket buffer的拷贝
mmap和sendfile的区别
mmap适合小数据量读写,sendfile适合大文件传输
mmap需要4次上下文切换,3次数据拷贝;sendfile需要3次上下文切换,至少2次数据拷贝
sendfile可以利用DMA减少CPU拷贝;mmap不行(必须要把数据拷贝到socket buffer)

insert image description here

3.12、协程、线程、进程三者的关系和区别

进程:是一个运行程序,有自己独立的内存空间,是系统资源分配的最小单位。进程比较重量,占据独立的内存,所以上下文进程间的切换开销(栈、寄存器、虚拟内存、文件句柄等)比较大,但相对比较稳定安全。
线程:是一个轻量级的进程,一个进程至少包含一个线程,同一个进程中的线程组共享内存空间,是操作系统调度执行的最小单位。线程间通信主要通过共享内存,上下文切换很快,资源开销较少,但相⽐进程不够稳定容易丢失数据。
协程:一个用户态的轻量级线程,协程的调度完全由应用程序控制。协程拥有自己的寄存器上下文和栈。协程调度切换时,将寄存器上下文和栈保存到其他地方,在切回来的时候,恢复先前保存的寄存器上下文和栈,直接操作栈则基本没有内核切换的开销,可以不加锁的访问全局变量,所以上下文的切换非常快。
insert image description here

3.13、父子进程在资源上有什么区别?

父进程通过fork创建子进程,刚开始时两者都没有对内存作出改动,父子进程共享内存资源。内核fork()时并不复制整个进程地址空间,而是让父子进程共享一个地址空间,只有在需要写入时,数据才会被复制,从而使各个进程拥有各自的拷贝数据。也就是说,只有在需要写入的时候才复制资源,在此之前,以只读方式共享。
insert image description here

3.14、C++中map数据结构,底层为什么不使用高度平衡二叉树?

最常见的两种自平衡树算法是红黑树和AVL树。为了在插入/更新后平衡树,两种算法都使用旋转的概念,其中树的节点被旋转以实现重新平衡。

AVL的左右子树高度差不能超过1,每次进行插入或删除操作时几乎都需要通过旋转操作保持平衡,在频繁插入或删除的场景中,频繁的旋转操作使得AVL的性能大打折扣。
红黑树的红黑规则只需要保证黑色节点高度一样(黑高),通过牺牲严格的平衡,换取插入、删除时少量的旋转操作,整体性能优于AVL。红黑树插入的不平衡,不超过两次旋转就可以解决;删除时的不平衡,不超过三次旋转就能解决。

虽然在这两种算法中,插入/删除操作都是 O ( l o g 2 n ) O(log_2^n) O(log2n),但在红黑树重新平衡旋转的情况下是 O(1) 操作,而在 AVL 的情况下,这是一个 O ( l o g 2 n ) O(log_2^n) O(log2n)操作,这使得红黑树在重新平衡阶段的这一方面更有效,这也是它更常用的可能原因之一。

3.15、消息队列存在的价值

消息队列的优点:

  1. 解耦:允许独立的扩展或修改队列两边的处理过程。
  2. 可恢复性:即使一个处理消息的进程挂掉,加入队列中的消息仍然可以在系统恢复后被处理。
  3. 缓冲:有助于解决生产消息和消费消息的处理速度不一致的情况。
  4. 灵活性和峰值处理能力:不会因为突发的超负荷的请求而完全崩溃,消息队列能够使关键组件顶住突发的访问压力。
  5. 异步通信:消息队列允许用户把消息放入队列但不立即处理它。

3.16、引入消息队列会带来什么问题?

  1. 消息队列会降低系统的可用性。
  2. 会提高系统复杂度。
  3. 需要考虑一致性问题。

四、面向stack overflow编程

Created with Raphaël 2.3.0 google 问题 进入出来的前两个就是stack overflow 检查问题解决方案是否满足? 结束 yes no

五、算法题

给一个数组root[]={6,3,5,3,6,7,8,-1,-1,5,6},构建二叉树,然后中序遍历输出二叉树。

6
3
5
3
6
7
8
5
6

#include <vector>
#include <string>
#include <queue>
#include <iostream>
using namespace std;

//二叉树结点结构
struct TreeNode {
    
    
        int val;
        TreeNode* left;
        TreeNode* right;
        TreeNode() :val(0), left(nullptr), right(nullptr) {
    
    }
        TreeNode(int x) :val(x), left(nullptr), right(nullptr) {
    
    }
        TreeNode(int x, TreeNode* left, TreeNode* right) :val(x), left(left), right(right) {
    
    }
};

TreeNode* createBinaryTree(vector<int> nodes) {
    
    
        int len = nodes.size();

        if (len == 0) {
    
    
                return NULL;
        }

        if (nodes[0] == -1) {
    
    
                return nullptr;
        }

        TreeNode* root;
        //建立结点队列并将根节点入队
        queue<TreeNode*> nodesQue;
        root = new TreeNode(nodes[0]);
        nodesQue.push(root);

        //loc遍历数组,每次取两个结点
        for (int loc = 1; loc < len; loc = loc + 2) {
    
    
                //获取结点并出队
                TreeNode* node = nodesQue.front();
                nodesQue.pop();

                //获取队头结点的左右结点
                int left = nodes[loc];
                int right = nodes[loc + 1];

                //赋予左右结点
                if (left == -1) {
    
    
                        node->left = nullptr;
                }
                else {
    
    
                        node->left = new TreeNode(left);
                        nodesQue.push(node->left);
                }

                if (right == -1) {
    
    
                        node->right = nullptr;
                }
                else {
    
    
                        node->right = new TreeNode(right);
                        nodesQue.push(node->right);
                }
        }
        return root;
}

void dfs(TreeNode* root)
{
    
    
        if(root==nullptr)
        {
    
    
                cout<<"null"<<endl;
                return;
        }
        cout<<root->val<<endl;
        dfs(root->left);
        dfs(root->right);
}

int main()
{
    
    
        vector<int> root={
    
    6,3,5,3,6,7,8,-1,-1,5,6};
        TreeNode* cur= createBinaryTree(root);
        dfs(cur);
        return 0;
}

总结

  1. 从 Go 调度器架构层面上介绍了 G-P-M 模型,通过该模型怎样实现少量内核线程支撑大量 Goroutine 的并发运行。以及通过 NetPoller、sysmon 等帮助 Go 程序减少线程阻塞,充分利用已有的计算资源,从而最大限度提高 Go 程序的运行效率。
  2. 消息队列带来的好处:解耦、削峰、异步,提高系统响应速度和稳定性。

参考

  1. 腾讯技术
  2. MySQL缓存策略
  3. Go语言详解
  4. DMA,Direct IO和零拷贝
  5. The underlying principle of C++ SLT map and unorder_map

Guess you like

Origin blog.csdn.net/Long_xu/article/details/129105404