Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

Most of the current Internet systems use server clustering technology. Clustering is to deploy the same service on multiple servers to form a cluster to provide external services as a whole. These clusters can be Web application server clusters, or database server clusters, or Is a distributed cache server cluster and so on.

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

In practical applications, there is always a load balancing server in front of the web server cluster. The task of the load balancing device is to serve as the entrance of the web server traffic, select the most suitable web server, and forward the client's request to it for processing. Transparent forwarding from client to real server.

The popular "cloud computing" and distributed architecture in recent years essentially use back-end servers as computing resources and storage resources, encapsulated by a management server as a service to provide external services, and clients do not need to care about the real service provider. Which machine, in its view, seems to be facing a server with almost unlimited capacity, and in essence, it is the back-end cluster that really provides services.

LVS, Nginx, and HAProxy are the three most widely used software load balancing software.

Generally, the use of load balancing is to use different technologies according to different stages as the scale of the website increases. The specific application requirements have to be analyzed in detail. If it is a small and medium-sized web application, such as a daily PV of less than 10 million, Nginx can be used; if there are many machines, DNS polling can be used. LVS still consumes a lot of machines. If there are large websites or important services, and there are many servers, LVS can be considered.

At present, the website architecture is generally more reasonable and popular: Nginx/HAProxy+Keepalived is used as the load balancer for the front end of the web;

LVS

LVS is the abbreviation of Linux Virtual Server, that is, Linux Virtual Server. Now LVS is a part of the standard Linux kernel. Since the Linux2.4 kernel, various function modules of LVS have been completely built-in, and various functions provided by LVS can be directly used without any patches to the kernel.

Since LVS started in 1998, it has developed into a relatively mature technology project.

Architecture of LVS

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

The server cluster system set up by LVS consists of three parts:

(1) The front-end load balancing layer, represented by Load Balancer

(2) The middle server cluster layer, represented by Server Array

(3) The bottom data shared storage layer, represented by Shared Storage

LVS load balancing mechanism

Unlike HAProxy and other layer 7 soft loads, LVS is oriented towards HTTP packets, so the work such as URL parsing that can be done by layer 7 loads cannot be done by LVS.

LVS is a four-layer load balancing, which means that it is built on the fourth layer of the OSI model - the transport layer. The transport layer has the familiar TCP/UDP. LVS supports TCP/UDP load balancing. Because LVS is a four-layer load balancing, its efficiency is very high compared to other high-level load balancing solutions, such as DNS domain name rotation resolution, application layer load scheduling, client scheduling, etc.

The so-called four-layer load balancing, that is, mainly through the target address and port in the packet. Layer 7 load balancing, also known as "content switching", is mainly through the really meaningful application layer content in the message.

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

The forwarding of LVS is mainly realized by modifying the IP address (NAT mode, divided into source address modification SNAT and destination address modification DNAT) and modification of the destination MAC (DR mode).

NAT Mode: Network Address Translation

NAT (Network Address Translation) is a technology that maps addresses between the external network and the internal network.

In NAT mode, the incoming and outgoing network datagrams must be processed by LVS. LVS needs to act as a gateway to RS (Real Server).

When the packet arrives at the LVS, the LVS does the destination address translation (DNAT) and changes the destination IP to the RS's IP. After RS ​​receives the packet, it seems that the client sends it directly. After the RS is processed, when the response is returned, the source IP is the RS IP, and the destination IP is the client's IP. At this time, the RS packet is transited through the gateway (LVS), and LVS will do source address translation (SNAT), changing the source address of the packet to VIP, so that the packet looks to the client as if LVS directly returned it to it.

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

DR Mode: Direct Routing

In DR mode, LVS and RS clusters need to be bound to the same VIP (RS is implemented by binding the VIP to the loopback), but the difference from NAT is that the request is accepted by LVS, and the real server (RealServer, RS) directly provides services. Return to the user without going through LVS when returning.

In detail, when a request comes, LVS only needs to modify the MAC address of the network frame to the MAC of a certain RS, and the packet will be forwarded to the corresponding RS for processing. Note that the source IP and destination IP are not identical at this time. Changed, LVS just did a bit of a move. When RS receives the packet forwarded by LVS, the link layer finds that the MAC is its own, and when it goes to the upper network layer, it finds that the IP is also its own, so the packet is legally accepted, and RS cannot perceive the existence of LVS in front of it. When the RS returns a response, it only needs to return directly to the source IP (that is, the user's IP) without going through the LVS.

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

In the DR load balancing mode, the IP address is not modified during the data distribution process, only the mac address is modified. Since the actual physical IP address of the actual processing request is the same as the destination IP address of the data request, there is no need to perform address translation through the load balancing server, and the response data can be converted. The package is directly returned to the user's browser, avoiding the bottleneck of the network card bandwidth of the load balancing server. Therefore, DR mode has better performance and is the most widely used load balancing method for large websites.

Advantages of LVS

  • It has strong anti-load capability and works on the transport layer for distribution purposes only, without generating traffic. This feature also determines that it has the strongest performance in load balancing software, and consumes less memory and cpu resources.

  • The configuration is relatively low, which is a disadvantage and an advantage. Because there is not much configuration, it does not require much contact, which greatly reduces the chance of human error.

  • The work is stable, because it has a strong anti-load capability, and it has a complete dual-machine hot backup solution, such as LVS+Keepalived.

  • No traffic, LVS only distributes requests, and traffic does not go out from itself, which ensures that the performance of the balancer IO will not be affected by large traffic.

  • It has a wide range of applications, because LVS works at the transport layer, so it can load balance almost all applications, including http, databases, online chat rooms, and so on.

Disadvantages of LVS

  • The software itself does not support regular expression processing, and cannot do dynamic and static separation; and now many websites have strong requirements in this regard, which is the advantage of Nginx, HAProxy+Keepalived.

  • If the website application is relatively large, the implementation of LVS/DR+Keepalived is more complicated. Relatively speaking, Nginx/HAProxy+Keepalived is much simpler.

Nginx

Nginx is a powerful web server software for handling high concurrent HTTP requests and as a reverse proxy server for load balancing. It has the advantages of high performance, light weight, low memory consumption, and powerful load balancing capabilities.

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

Architecture Design of Nignx

Compared with the traditional process or thread-based model (Apache adopts this model), a separate process or thread is established for each connection when handling concurrent connections, and it is blocked during network or input/output operations. This will lead to a lot of memory and CPU consumption, because starting a separate process or thread requires preparing a new runtime environment, including allocation of heap and stack memory, and a new execution context. Of course, these will also lead to redundant CPU. overhead. Ultimately, there will be poor server performance due to excessive context switching.

In turn, Nginx's architectural design is modular, event-driven, asynchronous, single-threaded, and non-blocking.

Nginx uses multiplexing and event notification a lot. After Nginx starts, it will run in the background as a daemon in the system, including a master process and n(n>=1) worker processes. All processes are single-threaded (that is, there is only one main thread), and inter-process communication mainly uses shared memory.

Among them, the master process is used to receive signals from the outside world, send signals to the worker process, and monitor the working status of the worker process. The worker process is the real processor of external requests, and each worker request competes independently and equally with requests from clients. Requests can only be processed in one worker process, and a worker process has only one main thread, so only one request can be processed at the same time. (The principle is very similar to Netty)

Jingdong architects help you master the working principles of LVS, Nginx and HAProxy

Nginx load balancing

Nginx load balancing mainly supports http and https on the seventh-layer application layer in the seven-layer network communication model.

Nginx performs load balancing in the form of a reverse proxy. Reverse Proxy means that the proxy server accepts connection requests on the Internet, then forwards the request to the server on the internal network, and returns the result obtained from the server to the client requesting connection on the Internet. At this time, the proxy server appears as a server to the outside world.

There are many distribution strategies for Nginx to achieve load balancing. Nginx's upstream currently supports the following methods:

  • Polling (default): Each request is allocated to different backend servers one by one in chronological order. If the backend server goes down, it can be automatically eliminated.

  • weight: Specifies the polling probability. The weight is proportional to the access ratio. It is used when the performance of the backend server is uneven.

  • ip_hash: Each request is allocated according to the hash result of the access ip, so that each visitor can access a back-end server fixedly, which can solve the problem of session.

  • fair (third party): allocate requests according to the response time of the backend server, and give priority to those with short response time.

  • url_hash (third party): Allocate requests according to the hash result of visiting urls, so that each url is directed to the same back-end server, which is more effective when the back-end server is cached.

Advantages of Nginx

  • Cross-platform: Nginx can be compiled and run on most Unix-like OS, and there are also ported versions for Windows

  • The configuration is surprisingly simple: very easy to get started. The configuration style is the same as program development, god-like configuration

  • Non-blocking, high concurrent connections: The official test can support 50,000 concurrent connections, and in the actual production environment, it runs to 20,000 to 30,000 concurrent connections

  • Event-driven: The communication mechanism adopts the epoll model to support larger concurrent connections

  • Master/Worker structure: a master process that generates one or more worker processes

  • Low memory consumption: The memory consumption of processing large concurrent requests is very small. Under 30,000 concurrent connections, only 10 Nginx processes that are opened consume 150M of memory (15M*10=150M)

  • Built-in health check function: If a web server at the back end of the Nginx proxy goes down, it will not affect front-end access

  • Save bandwidth: support GZIP compression, you can add the Header header cached locally by the browser

  • High stability: used for reverse proxy, the probability of downtime is minimal

Disadvantages of Nginx

  • Nginx can only support http, https and Email protocols, so the scope of application is smaller, this is its disadvantage

  • The health check of the backend server only supports detection by port, not by url. Does not support direct hold of Session, but can be solved by ip_hash

HAProxy

HAProxy supports two proxy modes, TCP (layer 4) and HTTP (layer 7), and also supports virtual hosts.

The advantages of HAProxy can complement some of the shortcomings of Nginx, such as support for Session retention, Cookie guidance; at the same time, it supports to detect the status of the back-end server by obtaining the specified url.

Similar to LVS, HAProxy itself is just a load balancing software; purely in terms of efficiency, HAProxy will have better load balancing speed than Nginx, and it is also better than Nginx in concurrent processing.

HAProxy supports the load balancing forwarding of the TCP protocol, which can load balance MySQL reads, detect and load balance the MySQL nodes at the back end, and you can use LVS+Keepalived to load balance the MySQL master and slave.

There are many HAProxy load balancing strategies: Round-robin (round robin), Weight-round-robin (weighted round robin), source (retain original address), RI (request URL), rdp-cookie (according to cookie).

Here I recommend an architecture learning exchange group to everyone. Communication and learning group number: 744642380, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization, distributed architecture Wait for these to become the necessary knowledge system for architects. You can also receive free learning resources, which are currently benefiting

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325463872&siteId=291194637