Load balancing, clustering, and high availability (HA) solutions for web applications

1. Familiar with several components

1.1, apache
     - it is an open source cross-platform web server of the Apache Software Foundation, which belongs to the old web server, supports virtual hosts based on IP or domain name, supports proxy servers, supports Secure Socket Layer (SSL), etc. At present, the Internet mainly uses it as a static resource server, and can also be used as a proxy server to forward requests (such as image chains, etc.), combined with servlet containers such as tomcat to process jsp.
1.2, ngnix
     - a high-performance HTTP and reverse proxy server developed by Russians. Because Nginx surpasses Apache's high performance and stability, more and more websites use Nginx as a web server in China, including Sina Blog, Sina Podcast, Netease News, Tencent.com, Sohu Blog and other portal website channels, etc., in 3w In the above high concurrency environment, the processing power of ngnix is ​​equivalent to 10 times that of apache.
     Reference: Performance analysis and comparison of apache and tomcat (http://blog.s135.com/nginx_php_v6/)
1.3, lvs
     - short for Linux Virtual Server, which means Linux virtual server, which is a virtual server cluster system. Founded in May 1998 by Dr. Zhang Wensong, who graduated from National University of Defense Technology, it can realize simple load balancing under the LINUX platform. To learn more, visit the official website: http://zh.linuxvirtualserver.org/.

1.4、HAProxy

     HAProxy provides high availability , load balancing and proxy based on TCP and HTTP applications, supports virtual hosts , it is a free, fast and reliable solution. HAProxy is especially suitable for those heavily loaded web sites that usually require session persistence or Layer 7 processing. HAProxy runs on current hardware and can fully support tens of thousands of concurrent connections. And its operating mode makes it easy and safe to integrate into your current architecture, while protecting your web server from being exposed to the network.
1.5, keepalived
     - keepalived here is not something like apache or tomcat The attribute field on the component, which is also a component, can achieve high availability (HA high availably) of the web server. It can detect the working status of the web server. If the failure of the server is detected, it will be removed from the server group. After it works normally, keepalive will automatically detect and join the server group. Realize the instantaneous and seamless handover of IP when the active and standby servers fail. It is a user-space daemon for LVS cluster node health detection and LVS's director failover. The Keepalived daemon can check the status of the LVS pool. If a server in the LVS server pool goes down. keepalived will notify the kernel to remove the node from the LVS topology map via a setsockopt call.

Detailed explanation of Keepalived: https://my.oschina.net/piorcn/blog/404644
1.6, memcached
     - it is a high-performance distributed memory object caching system. It was originally a system developed by Danga Interactive for the rapid development of LiveJournal, which is used to cache business query data and reduce the load on the database. Its daemon is written in C, but the client supports almost all languages ​​(there are basically three versions of the client [memcache client for java; spymemcached; xMecache]), and the server and client communicate through a simple protocol ; Data cached in memcached must be serialized.
1.7, terracotta
     - is a famous open source Java cluster platform developed by Terracotta Company in the United States. It implements an abstraction layer between the JVM and Java applications that handles clustering functions, allowing users to implement clusters of Java applications without changing the system code. Support data persistence, session replication and high availability (HA). Detailed reference: http://topmanopensource.iteye.com/blog/1911679

2. Key terms
2.1, load balance
 
In the era of rapid development of the Internet, large data volume and high concurrency are the most mentioned on Internet sites. How to deal with the system performance problems caused by high concurrency, everyone will eventually use the load balancing mechanism. It distributes requests to each server in the cluster according to a certain load strategy, allowing the entire server cluster to process website requests.
If the company is relatively rich, they can buy hardware specially responsible for load balancing (such as: F5), and the effect will definitely be very good. For most companies, an inexpensive and efficient way to expand the overall system architecture will be chosen to increase server throughput and processing power, as well as load capacity.

2.2. Cluster

 A loosely coupled multiprocessor system is formed with N servers (for the outside world, they are a server), and they communicate through the network. Let N servers cooperate with each other to jointly carry the request pressure of a website.

2.3. High Availability (HA)

 In the cluster server architecture, when the primary server fails, the backup server can automatically take over the work of the primary server and switch to the past in time to achieve uninterrupted service to users. ps: Here I feel that it is the same as failover. The netizens who saw it will give an explanation, thank you?

2.4. Session replication/sharing

 During the session of accessing the system, after the user logs in to the system, no matter what the resource address of the access system does not need to log in again, the servlet easily saves the user's session. If two tomcats (A, B) provide cluster services, the user logs in on A-tomcat, and the next request web server is distributed to B-tomcat according to the policy, because B-tomcat does not save the user's session (session) information, If you don't know its login, you will jump to the login interface.
At this time, we need to let B-tomcat also save the session of A-tomcat. We can use tomcat's session replication implementation or share the session through other means.

3. Commonly used web clusters

3.1, tomcat cluster solutions
 apache+tomcat; ngnix+tomcat; lvs+ngnix+tomcat; the first two are familiar to everyone. (lvs is responsible for cluster scheduling, nginx is responsible for static file processing, and tomcat is responsible for dynamic file processing [optimal choice]). Take the apache+tomcat cluster as an example, briefly say:
  1. There are three ways of communication between them: ajp_proxy, mod_jk linker, http_proxy. Specific reference: http://www.ibm.com/developerworks/cn/opensource/os-lo-apache-tomcat/
  2. There are four distribution strategies for apache. Weight (default), traffic (bytraffic), number of requests (byRequests), busyness (byBusyness is based on the number of active requests)
  3. Apache supports stickysession (sticky session), that is: if the visiting user accesses A-tomcat, then he All requests from are forwarded to A-tomcat, not B-tomcat. [Such a load balancing effect is not good, suitable for small websites, the following is a non-sticky session]
  4. The architecture between them is shown in Figure 1:



 Problem 1: Only one web server, obvious single point of failure. If there is a problem with that apache, the whole website will go down.

3.2, session replication


  If stickysession (sticky session) is not used, then we can use tomcat's session replication to make all nodes' tomcat sessions the same. Tomcat uses multicast technology. As long as the session of one tomcat node in the cluster changes, it will broadcast to notify all tomcat nodes of the change. .

Question 2: According to the test of netizens, when the number of tomcat nodes reaches more than 4, the cluster performance declines linearly; in addition, when the number of user visits reaches a certain level, the session content increases, and the communication between the tomcat nodes generates a lot of network consumption , resulting in network congestion, and the throughput of the entire cluster cannot increase.


4. High availability (HA) and session sharing (solve the two problems mentioned above)


4.1. Use lvs+keepalive to achieve high availability of the cluster and achieve a more robust LB.
 We can use lvs as the front-end for load balancing, and distribute requests to the corresponding web server cluster according to the 8 scheduling algorithms of lvs (can be set). LVS is used as a dual-system hot backup. The keepalived module can automatically transfer the fault to the backup server and provide services without interruption. The structure is shown in Figure 2:



 Description: According to the query, HAProxy+keepalived+nginx is generally used for load balancing on the WEB side; the database mysql cluster is implemented by Lvs+keepalived+mysql. Because HAProxy and nginx work on the 7th layer of the network, and the former makes up for some shortcomings of nginx such as session retention, cookie guidance, etc., and it is a balancing software itself, it must be better than nginx in handling load balancing; lvs It is cumbersome, and it is more complicated to implement for relatively large network applications. Although it runs on the 4th layer of the network, it only distributes and does not generate traffic, but it cannot do regular processing, nor can it do static and dynamic separation, so generally use lvs+keepalived Or heatbeat to do load balancing at the database layer.

Comparison of LVS, HAProxy, and Nginx for load balancing
4.2, use terracotta or memcached to share sessions
 

 4.2.1, terracotta is jvm-level session sharing

 Its basic principle is that for data shared between clusters, when a node changes, Terracotta only sends the changed part to the Terracotta server, and then the server forwards it to the node that really needs the data, and the shared data object Serialization is not required.

 

4.2.2. Memory-level session sharing through memcached

Through the memcached-session-manager (msm) plug-in, through a certain configuration on tomcat, the session can be stored on the memcached server. Note: tomcat supports tomcat6+, and memcached can support distributed memory, msm supports both sticky sessions (sticky sessions) or non-sticky sessions (non-sticky sessions) two modes, objects shared in memcached memory need to be serialized. The structure is shown in Figure 3:



 With certain configuration, failover can be achieved (only for non-sticky sessions). Such as:

Xml code   Favorite code
  1. <Context>    
  2.       ...    
  3.       <Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"    
  4.         memcachedNodes="n1:host1.yourdomain.com:11211,n2:host2.yourdomain.com:11211"    
  5.         failoverNodes="n1"    
  6.         requestUriIgnorePattern=".*\.(ico|png|gif|jpg|css|js)$"    
  7.         transcoderFactoryClass="de.javakaffee.web.msm.serializer.kryo.KryoTranscoderFactory"    
  8.         />    
  9. </Context>  

 Description: failoverNodes: Failover nodes, unavailable for non-sticky sessions. The function of the attribute failoverNodes="n1" is to tell msm that it is best to save the session on the memcached "n2" node, and only save the session on the n1 node when the n2 node is unavailable. In this way, even if the tomcat on host2 is down, the session stored in the memcached "n1" node can still be accessed through the tomcat on host1.
 
4.2.3. Other schemes
save user information (usually login information) through cookies. When each request arrives at the web application, the web application extracts data from the cookie for processing (try to encrypt the cookie here);
the other is Save key attributes of user information to the database, so that no session is required. The request comes over to query the key attribute data from the database, and do the corresponding processing. Disadvantages: Increase the load of the database, making the database a bottleneck of the cluster.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326015514&siteId=291194637
Recommended