nginx's io multiplexing, blocking non-blocking, synchronous and asynchronous, the difference between apache and nginx

Excerpted from blog park rikewang blog, easy to find and read by yourself! ! ! !
http://www.cnblogs.com/wxl-dede/p/5134636.html

Synchronous asynchronous, blocking non-blocking and nginx's IO model

Synchronous and Asynchronous

Synchronous and asynchronous focus on message communication mechanisms (synchronous communication/ asynchronous communication). The so-called synchronization means that when a call is issued , the call does not return until the result is obtained. But once the call returns, you get the return value. In other words, the caller actively waits for the result of the call . Asynchronous is the opposite. After the call is issued, the call returns directly, so no result is returned. In other words, when an asynchronous procedure call is issued, the caller does not get the result immediately. Instead, after the call is made, the callee notifies the caller through status, notification, or handles the call through a callback function.

Typical asynchronous programming models such as Node.js

举个通俗的例子:你打电话问书店老板有没有《分布式系统》这本书,如果是同步通信机制,书店老板会说,你稍等,"我查一下",然后开始查啊查,等查好了(可能是5秒,也可能是一天)告诉你结果(返回结果)。而异步通信机制,书店老板直接告诉你我查一下啊,查好了打电话给你,然后直接挂电话了(不返回结果)。然后查好了,他会主动打电话给你。在这里老板通过"回电"这种方式来回调。

blocking and non-blocking

Blocking and non-blocking focus on the state of the program while waiting for the result of the call (message, return value). Blocking calls mean that the current thread will be suspended before the result of the call is returned. The calling thread does not return until it has the result. A non-blocking call means that the call will not block the current thread until the result is not immediately available.

还是上面的例子,你打电话问书店老板有没有《分布式系统》这本书,你如果是阻塞式调用,你会一直把自己"挂起",直到得到这本书有没有的结果,如果是非阻塞式调用,你不管老板有没有告诉你,你自己先一边去玩了, 当然你也要偶尔过几分钟check一下老板有没有返回结果。在这里阻塞与非阻塞与是否同步异步无关。跟老板通过什么方式回答你结果无关。

I/O model

Since the process cannot directly access external devices, it can only call the kernel to call external devices (context switching), and then external devices such as disks read the data stored in the device itself and transfer it to the kernel buffer. The kernel buffer is copied data to the user process's buffer. In the process of giving the response from the external device to the user process, there are two stages; due to the different ways of responding to the data, there are different I/O models.

There are generally five I/O models:

Blocking I/O Model:

All sockets are blocking by default. The process hangs, the kernel waits for the external IO response, the IO completes the transfer of data to the kernel buffer, and the data is copied from the buffer to the user's process space

Non-blocking I/O:

After the kernel requests the IO device to respond to the command, the data begins to prepare. During this period, the user process is not blocked, that is, it is not suspended. It is asking or checking whether the data has been sent to the kernel buffer, busy and so on. But the second stage (copying data from kernel buffer to user process space) is still blocking. However, this IO model will occupy a lot of CPU time, the efficiency is very low, and it is rarely used.

I/O multiplexing (select, poll, epoll...):

After the kernel requests the IO device to respond to the command, the data begins to be prepared, during which the user process is blocked. The process of copying data from the kernel buffer to the user process is also blocking. But unlike blocking I/O, it can block multiple I/O operations at the same time, and can detect multiple read operations and I/O functions of multiple write operations at the same time until data is readable or available. When writing, the I/O operation function is actually called, which means that a thread can respond to multiple requests.

Signal-driven I/O (event-driven)

The first stage is non-blocking. After the data is transmitted to the kernel buffer, the thread is directly notified by a signal, which is not used much.

Asynchronous I/O:

Not notify the user process until the entire operation (including copying data from the kernel to user space) is complete

the entire summary

epoll, poll, select in Nginx

These three modes belong to the IO multiplexing model.

Select and poll are active queries. They can query the status of multiple fds (file handles) at the same time. In addition, select has a limit on the number of fds, while poll has no limit. The difference between select and poll is that they create different event descriptors. Select creates three sets of read, write, and exception, while poll sets three descriptions in one set. Because select and poll check the occurrence of events in each cycle , and poll has fewer events, and its performance is better than select;

epoll is based on callback functions without polling. If there are many sockets, each select() will complete the scheduling by traversing FD_SETSIZE Sockets, no matter which Socket is active, it will be traversed once. This wastes a lot of CPU time. If you can register a callback function for sockets and automatically complete related operations when they are active, then polling is avoided, which is exactly what epoll (Linux), kqueue (FreeBSD), /dev/poll (soloris) do of. As a classic example, let's say you're studying at a university, you live in a dorm with lots of rooms, and your friends want to come to you. The select version of the dorm aunt will take your friends from room to room to find you until you are found. The epoll version of the dormitory aunt will first write down the room number of each classmate. When your friend comes, you only need to tell your friend which room you live in. You don't need to take your friend all over the building to find someone in person. If there are 10,000 people who come and find their classmates who live in this building, it is self-evident which one is more efficient, the select version or the epoll version. In the same way, polling I/O is one of the most time-consuming operations in a high-concurrency server. It is also very clear which of select, epoll, and /dev/poll has higher performance.

Web general request process

First, our client sends a request to the web server, and the request first goes to the network card.
The network card hands the request to the kernel of the kernel space for processing, which is actually unpacking, and it is found that the request is port 80.
The kernel sends the request to the web server in user space, the web server unpacks and finds the index.html page requested by the client, and the
web server makes a system call to send the request to the kernel. The
kernel finds that the requested page is a page, and then calls The driver of the disk connects to the disk. The
kernel calls the disk to obtain the page file through the driver. The
kernel saves the obtained page file in its own cache area and then notifies the Web process or thread to fetch the corresponding page file. The
Web server stores the kernel cache through system calls. The page file is copied to the process cache area. The
web server obtains the page file to respond to the user, and sends the page file to the kernel through system calls again. The
kernel process page file is encapsulated and sent out through the network card.
When the message reaches the network card, it responds to the client through the network end
"`

About Apache and nginx comparison:

Since the web server has a one-to-many relationship, there are usually three ways to complete parallel processing: multi-process, multi-thread, and asynchronous.

Multi-process: Multi-process means that each process corresponds to a connection to process requests. The process responds to its own requests independently. If one process hangs, it will not affect other requests; and the design is simple and will not cause problems such as memory leaks. Therefore, The process is relatively stable. However, when a process is created, it is generally a fork mechanism, which will cause memory replication problems. In addition, in the case of high concurrency, context switching will be very frequent, which will consume a lot of performance and time. The prework model used by early apache is multi-process, but apache will create several processes in advance and wait for the user's response. The process will not end after the request is completed. Therefore, there are many optimizations in performance.

Multithreading: Each thread responds to a request. Since the data of the process is shared between threads, the overhead of the thread is smaller and the performance will be improved. Since thread management requires the program to apply and release memory by itself, when there are memory problems, it may take a long time to expose the problem, so it is not very stable to a certain extent. Apache's worker mode is this way

Asynchronous way: nginx's epoll, apache's event is also supported, not much to say

Nginx's IO model is event-driven, which enables applications to quickly switch between multiple IO handles to achieve so-called asynchronous IO. The event-driven server is most suitable for IO-intensive work, such as reverse proxy, which acts as a data transfer between the client and the WEB server. It is purely an IO operation and does not involve complex calculations. It is obviously better to use an event-driven reverse proxy. One worker process can run without the overhead of process and thread management, and the CPU and memory consumption are small.

Application servers such as Apache generally run specific business applications, such as scientific computing, graphics and images. They are likely to be CPU-intensive services, and event-driven is not suitable. For example, a calculation takes 2 seconds, then these 2 seconds are completely blocked, and any event is useless. Consider what would happen if MySQL were changed to event-driven, a large join or sort would block all clients. At this time, multiple processes or threads show the advantage. Each process does its own thing without blocking and interfering with each other. Of course, modern CPUs are getting faster and the time a single computation blocks may be small, but as long as there is blocking, event programming has no advantage. Therefore, technologies such as processes and threads will not disappear, but complement the event mechanism and exist for a long time.

In general, event-driven is suitable for IO-intensive services, and multi-process or threading is suitable for CPU-intensive services

In fact, it means that nginx is more suitable for front-end proxy, or processing static files (especially in the case of high concurrency), while apache is suitable for back-end application server, with powerful functions [php, rewrite...] and high stability.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325768503&siteId=291194637