[Load Balancing]-LVS

Cluster

When the number of visits to a stand-alone system increases rapidly, the system begins to be unable to withstand the huge number of visits, and performance begins to decline.

In order to improve performance and expand the system, there are two ways: vertical expansion Scale Up and horizontal expansion Scale Out . Vertical expansion is to enhance the various indicators and performance of the machine; horizontal expansion is to add machines and equipment to improve services. In the case of horizontal expansion, the problem of request scheduling and allocation is introduced, that is, when a request comes, which device should the request be sent to for processing. This requires a scheduler to complete the assigned tasks. LVS is such a scheduler.

Under horizontal scaling, multiple computers are combined into a single system to solve a specific problem. Such a system is called a cluster . Due to the existence of the scheduler, when a user requests to access the cluster, he actually accesses the IP address of the scheduler, and then the scheduler distributes the request to different back-end actual servers. We call the IP of the scheduler VIP , virtual IP; the IP of the actual back-end server is called RIP , real IP

cluster classification

According to the focus of the cluster, it can be divided into LB load balancing type, HA high availability type, and HPC high performance type. Among them, the high availability type is mainly designed to avoid single point of failure SPOF , and its measurement standard is

A = M T B F / ( M T B F + M T T R ) A = MTBF/(MTBF + MTTR) A=MTBF/(MTBF+MTTR)

Among them, MTBF is the Mean Time Between Failure, the mean time between failures; MTTR is the Mean Time To Restoration, the average time before recovery, which is the mean time to failure. Then A actually represents the proportion of failure-free time to the total time. For a cluster with a scheduler, if the scheduler has only one machine, then if the scheduler goes down, the back-end server will not be able to access it, which is equivalent to the entire cluster being down, so for the scheduler, high configuration is required. Clusters can be used to prevent single points of failure in the scheduler

Implementation of LB Cluster

The implementation of load balancing can be divided into hardware load balancing and software load balancing. Hardware load balancing requires the use of equipment with load balancing functions, such as F5 switches used by many large enterprises; due to the high cost of hardware equipment, software load balancing can be selected. Related software such as LVS, Nginx, haproxy, ats, etc.

Work-based network protocol layers can be divided into transport layer L4 load balancing and application layer L7 load balancing

session hold

The issue of session persistence is often mentioned when discussing load balancing: the first time a request from a customer IP hits a server, a session for the customer IP is generated on this server; but the second request from the same IP However, it is possible to hit another server. In this way, the second server needs to generate a session corresponding to the client's IP on the local machine. The corresponding session on the first server will not be used, which is equivalent to the client's first session. The requested login information is gone, which will affect the user interaction experience. A session persistence solution can be introduced to ensure that even if multiple requests from a client IP hit different servers, the server can still hold the corresponding session. There are three implementations:

Session Sticky

Ensure that accesses from the same customer will be scheduled to the same backend server by the scheduler. A mapping table from the client IP to the back-end server IP needs to be maintained inside the scheduler. When a client request comes, it can perform directional scheduling based on its IP search mapping relationship.

The disadvantage of this solution is that when using NAT , after multiple customer IPs pass through NAT, the scheduler sees the same IP, that is, the IP of the NAT router; this may cause a back-end server to be scheduled too many times. Customer IP; In addition, the scheduler is required to maintain the IP mapping table , which requires a certain cost.

There is also an implementation that schedules and forwards based on cookies rather than based on the client IP.

Session Replication

Each server will have all sessions. Each server exchanges its own sessions through replication, so that no matter which server the client request hits, it will have the corresponding session.

The problem with this solution is also obvious. Each server must save all sessions. The session data is very redundant , and each server requires a large amount of storage resources.

Session Server

The third solution is more commonly used than the previous two. It uses a dedicated session server to store session data, such as memcached and redis.

When the server needs session data, it can be obtained from the session server. Then the session server should also build a high-availability cluster to prevent single points of failure.

Implementation of HA Cluster

keepalived is an excellent high-availability solution to solve LVS single point of failure.

LVS

Linux Virtual Server, a load balancing scheduler, has been integrated into the Linux kernel. The inventor is Zhang Wensong. VS is responsible for scheduling, while RS is responsible for actually providing services.

Its working principle is that VS schedules and forwards the request message to a certain RS based on the destination IP, destination protocol and destination port. A scheduling algorithm needs to be used to select the forwarded RS.

Compared with DNAT (destination address translation), DNAT is used for requests initiated from the Internet to within the enterprise. The public network request hits a server connected to the public network of the enterprise, and the server then forwards the request from the public network IP to the enterprise. server in the private network. It can only forward one-to-one and does not have the function of scheduling

Related terms

  • VS , virtual server, LVS. Also known as DS ( Director S erver Director\ ServerDirector Server),Dispatcher,Load Balancer
  • RS , the actual server, also known as upstream server (in Nginx), backend server (in haproxy)
  • CIP , customer IP; VIP , VS IP in the external network; DIP (Director IP), VS IP in the internal network; RIP , actual server IP

Implementation tool

  • ipvsadm : A command-line tool for user space, a rule manager for LVS, used to manage cluster services and RS, similar to iptable. LVS is a kernel-level function, and ipvsadm is the tool needed to configure this kernel function. The relationship between ipvsadm and LVS is similar to the relationship between iptable and netfilter.
  • ipvs : A framework that works on the INPUT hook of netfilter in the kernel space and implements the specific process of LVS.

LVS cluster type

LVS-NAT

Modify the target IP of the request message, which is equivalent to multi-target DNAT. When LVS receives the request message, it changes the destination IP and destination port of the message to the IP and port of the RS to be forwarded and then forwards it.

When RS sends back a response message, the destination IP and port are the IP and port of the client, and the source IP and port are the IP and port of RS itself. Then the response message must be returned to the client according to the original path of the request message, passing through LVS in the middle. Because the source IP and port of the response message are the IP and port of the RS, but the destination IP and port when the client sends the request message are the IP and port of the LVS, the source IP and port of the response message it expects to receive are Naturally, it needs to be the IP and port of LVS, so the response message must also pass through LVS, allowing LVS to modify the source IP and port to its information.

Then, all packets between the client and the backend RS will pass through LVS, which puts great pressure on LVS and can easily become a performance bottleneck of the system.

LVS and RS should be in the same subnet and use private network IP to communicate, and the gateway of RS is LVS. Of course, introducing a router between them will not affect the function, but it may extend the link delay and reduce the bandwidth, so it is generally not necessary to use a router to connect

Diagram of NAT

LVS-DR

D i r e c t   R o u t i n g Direct\ Routing Direct routing , direct routing , is the default mode of LVS and is the most  widely used. LVSforwardsthe request message by re-encapsulating a MAC header . Among them, the source MAC is the MAC of the interface where the DIP is located, and the destination MAC is the MAC address of the interface where the RIP of the selected RS is located; the source IP, source port, destination IP, and destination port remain unchanged. Therefore, DR mode does not support port mapping, and only the MAC address is modified during the communication process.

Then, LVS must know the MAC address of RS. This can be achieved through ARP, but requires LVS to store the IP address of RS in advance. So in order to find the MAC address of RS through ARP, VS and RS can only be connected through switches instead of routers. RIP and DIP must be in the same subnet.

Since the destination IP remains unchanged when the data packet travels from LVS to RS, all RSs must also have an IP address of VIP , which will lead to IP conflicts between LVS and RS . In order to ensure that the front-end router sets the destination IP as VIP Request messages are sent to Director in the following ways:

  1. Staticly bind the VIP and DIrector MAC addresses on the front-end gateway router. This method requires configuring the router, which is impossible or difficult for ordinary users to intervene; and the MAC address corresponding to the VIP is hard-coded. If the LVS device is replaced, the related configurations also need to be replaced, which is not flexible enough.
  2. Modify the kernel parameters on the RS to limit the ARP notification and response levels: /proc/sys/net/ipv4/conf/all/arp_ignore, /proc/sys/net/ipv4/conf/all/arp_announce. Make the RS neither respond to ARP queries nor send ARP notifications to avoid IP conflicts. This way is more commonly used

When RS sends back a response message, since the destination IP of the client's request message is VIP, RS naturally needs to use VIP as the source IP of the response message. Then the destination IP is CIP

Therefore, all request messages will go through LVS and then forwarded to RS, but all response messages will not go through LVS and are sent directly from RS to the gateway router and then to the Client, which greatly reduces the pressure on LVS.

RS's RIP can use private network addresses or public network addresses, because the entire communication process does not involve RIP. But generally use private network addresses

LVS-TUN (tunnel)

The IP header of the request message is not modified (the source IP is CIP and the destination IP is VIP), but an IP header is encapsulated outside the original IP packet . The source IP is DIP and the destination IP is RIP, and then the message is sent to all parties. The selected destination RS; when the RS responds, it directly responds to the client (the source IP is VIP and the destination IP is CIP). Port mapping is not supported

DIP, VIP, and RIP should all be public network addresses. DIP and RIP can also be private network addresses; RS and VS can cross network segments, computer rooms, and regions. In this case, public network addresses need to be used for communication; while RS and VS can Cross-region means that RS can also cross-region to achieve remote disaster recovery; the gateway of RS generally cannot point to DIP to ensure that response messages do not pass through VS

The RS should support the tunnel function, because the message received by the RS has two IP headers, and it should have the function of normally identifying and removing the additional headers added by the VS.

LVS FULLNAT

This mode is not supported by the kernel by default. Modify the source IP and destination IP of the request message at the same time. The source IP is changed from CIP to DIP, and the destination IP is changed from VIP to RIP.

Then, because the source IP of the message sent by VS to RS is changed to DIP, the response message of RS needs to go through VS and then forwarded by VS to the Client; FULLNAT mode supports port mapping

ipvs scheduler

Scheduling can be divided into two categories: static and dynamic. The difference lies in whether the current load status of each RS is considered during scheduling.

static scheduling

Static scheduling is scheduling based only on the algorithm itself:

  • RR : round-robin, polling

  • WRR : weighted RR, weighted polling, giving a weight to each RS

  • SH : Source Hashing, which implements session sticky. By hashing the source IP address, requests from the same IP address are always sent to the first selected RS, thereby achieving session binding.

  • DH : Destination Hashing: Destination address hashing. Requests sent to the same destination address are always forwarded to the first selected RS. The typical scenario is load balancing in a forward proxy cache scenario, such as a broadband operator.

    For example, assume that the user's request will be sent to a forward proxy server, and then the proxy server will go to the service provider's server to obtain resources based on the IP address of the service provider requested by the user, and then return it to the user. A cache server is also set up between the proxy server and the service provider's server. The cache server is RS relative to the proxy server. For example, the resources of service provider A are cached on cache server 1, and the resources of service provider B are cached on cache server 1. On cache server 2..., then in order to improve the cache hit rate and improve the speed of user access to resources, the proxy server should schedule the request from service provider A to cache server 1, and so on. Select the cache server to be scheduled based on the target address of the request, thereby improving the cache hit rate

dynamic scheduling

The dynamic algorithm is mainly scheduled based on the current load status of each RS and the scheduling algorithm. RSs with smaller loads will be scheduled.

  • LC : Least Connections , the load calculation formula is

    O v e r h e a d = a c t i v e c o n n s ∗ 256 + i n a c t i v e c o n n s Overhead = activeconns * 256 + inactiveconns Overhead=activeconns256+inactiveconns

    inactiveconns An inactive connection refers to a connection that has been established but has no data communication activity. This calculation method is equivalent to considering that the resources occupied by an active connection are equivalent to the resources occupied by 256 inactive connections. This calculation method only considers the number of server connections and does not consider the performance of the server itself. It is possible that although a server has many connections, its performance is actually very good and it can accept more connections.

  • WLC : Weighted LC , the default scheduling algorithm, compared with LC, has more weight calculations and takes into account the performance of the server itself.

    O v e r h e a d = ( a c t i v e c o n n s ∗ 256 + i n a c t i v e c o n n s ) / w e i g h t Overhead = (activeconns * 256 + inactiveconns) / weight Overhead=(activeconns256+inactiveconns)/weight

    The problem with this method is: when the cluster first starts running, the number of connections on each server is 0, and the Overhead is also 0. At this time, a server is randomly selected for scheduling, and it is possible to select a server with poor performance. The server serves as the first server to be scheduled, and a better approach would be to allow the server with the best performance to be selected

  • SED : Shortest Expectation Delay , the load calculation formula is

    O v e r h e a d = ( a c t i v e c o n n s + 1 ) ∗ 256 / w e i g h t Overhead = (activeconns + 1) * 256 / weight Overhead=(activeconns+1)256/weight

    In this way, priority scheduling with high weight can be achieved when the cluster is just started. The problem with this method is that if a server with a high weight is too different from other servers, then a series of requests from the beginning will be scheduled to that server, and other servers will not be able to accept a single request.

  • NQ : Never Queue , the first round of requests is evenly distributed, and then SED is used for scheduling. This will not cause all requests to be scheduled to a server with a high weight at the beginning, and other servers can also receive requests.

  • LBLC : Locality-Based LC , dynamic DH algorithm, implements forward proxy according to load status, and schedules to different cache servers according to the load status of the cache server

  • LBLCR : LBLC with Replication , LBLC with replication function, copies the cached content from the heavily loaded server to the lightly loaded server, and can be scheduled to other servers with the same cached data during forward proxy scheduling.

ipvsadm command

  1. Add and modify cluster services:ipvsadm -A|E -t|u|f service-address [-s scheduler] [-p [timeout]]
  2. Delete the cluster service: ipvsadm -D -t|u|f service-address
    t represents the port for scheduling the TCP protocol, u represents the UDP protocol; f represents the firewall
    service-address represents the service address, in the form of VIP:PORT.
    Therefore, LVS scheduling is based on the protocol, IP and port, and is sent to VS. If the request has a condition that is different from the rules defined by LVS, it will not be scheduled to the back-end RS, but will be processed directly by VS itself.
    -s is used to specify the scheduling algorithm.
  3. Add and modify RS on the cluster:ipvsadm -a|e -t|u|f service-address -r server-address [-g|i|m] [-w weight]
  4. Delete RS: ipvsadm -d -t|u|f service-address -r server-address
    Specify the address of RS after -r: RIP[:PORT], port can be omitted, if omitted, port mapping will not be done
    -w, specify the weight
  5. LVS type: -g, means gateway, DR model, default; -i, tun model; -m, means masquerade, NAT model
  6. ipvsadm -LnView defined cluster services
    Options:-n, output the address and port number in numerical form; –exact, display the exact value; –c, display the current connection; –stats, display statistical information; –rate, output rate information
  7. ipvsadm -CClear all contents of the definition
  8. ipvsadm -Z [-t|u|f service-address]Clear counters, such as the statistical number of connections on each server
  9. The rules of ipvs exist in /proc/net/ip_vs; the connections of ipvs exist in /proc/net/ip_vs_conn
  10. Rule save: ipvsadm -Sn > 文件
    reload:ipvsadm -R

Guess you like

Origin blog.csdn.net/Pacifica_/article/details/127127680