Article Directory
Preface
Common web cluster scheduler
At present, common Web cluster schedulers are divided into software and hardware. The software usually uses open source LVS, Haproxy, and Nginx. The hardware generally uses F5. Many people use some domestic products, such as Barracuda, NSFOCUS, etc.
Although LVS has strong anti-load ability in enterprise applications, it has shortcomings
LVS does not support regular processing, and cannot achieve dynamic and static separation.
For large websites, the implementation and configuration of LVS are complicated and the maintenance cost is relatively high.
Haproxy is a software that provides high availability, load balancing, and proxy based on TCP and HTTP applications.
Especially suitable for Web sites with a heavy load.
Running on the current hardware, it can support tens of thousands of concurrent connection requests.
One: Haproxy scheduling algorithm
Haproxy supports a variety of scheduling algorithms, and there are three most commonly used: RR (Round Robin), LC (Least Connections), SH (Source Hashing)
1.1:RR(Round Robin)
The RR algorithm is the simplest and most commonly used algorithm, that is, round-robin scheduling.
Understanding for example,
there are three nodes A, B, and C. The first user access will be assigned to node A, and the second user access will be assigned to node B. , The third user's access will be assigned to node C. The
fourth user's access will continue to be assigned to node A, polling and assigning access requests to achieve load balancing
1.2:LC(Least Connections)
The LC algorithm is the minimum number of connections algorithm, which dynamically allocates front-end requests according to the number of connections of the back-end nodes.
Understand that
there are three nodes A, B, and C, and the number of connections of each node are A: 4, B: 5, and C: 6, this If there is the first user connection request, it will be assigned to A, the number of connections becomes A: 5, B: 5, C: 6 The
second user request will continue to be allocated to A, the number of connections becomes A6, B: 5, C: 6; A new request will be assigned to B, and each time a new request will be assigned to the client with the smallest number of connections.
Because of the actual situation, the number of connections of A, B, and C will be dynamically released, which is difficult There will be the same number of connections, so this algorithm is greatly improved compared to the rr algorithm. It is an algorithm that is currently used more.
1.3: SH (Source Hashing)
SH is based on the source access scheduling algorithm. This algorithm is used for some There is a scenario where the Session session is recorded on the server side, and cluster scheduling can be done based on the source IP, Cookie, etc.
Understanding for example,
there are three nodes A, B, and C. The first user is assigned to A for the first visit, and the second user is the first Access is assigned to B.
When the first user visits for the second time, it will continue to be assigned to A, and the second user will still be assigned to B for the second visit. As long as the load balancing scheduler does not restart, the first user Access will be assigned to A, and the second user access will be assigned to B. To achieve cluster scheduling,
the advantage of this scheduling algorithm is to achieve session retention, but when some IP accesses are very large, it will cause unbalanced load and some node access. Oversized, affecting business use
Two: Haproxy cluster construction
2.1: Environmental preparation
VMware软件
两台centos7虚拟机作为NGINX
IP地址:192.168.100.3 PC-3
IP地址:192.168.100.4 pc -4
一台centos7虚拟机作为Haproxy(IP地址:192.168.100.20) pc-2
2.2: Installation and startup of Nginx
Install Nginx on the two web servers and start the service
Install environment software
yum install -y gcc gcc-c++ make pcre-devel expat-devel perl bzip2 zlib-devel pcre
useradd -M -s /sbin/nologin nginx // 创建程序账户
[root@pc-3 opt]# tar zxvf nginx-1.12.2.tar.gz
[root@pc-3 opt]# cd nginx-1.12.2/
[root@pc-4 nginx-1.12.2]# ./configure \
> --prefix=/usr/local/nginx\
> --user=nginx\
> --group=nginx
[root@pc-4 nginx-1.12.2]# make && make install
Set home page
[root@pc-4 nginx-1.12.2]# cd /usr/local/nginx/html/
[root@pc-4 html]# echo "this is monkey" > test.html
[root@pc-4 html]# ln -s /usr/local/nginx/sbin/nginx /usr/local/sbin/
PC3操作和PC4之前都一样
只不过这步不一样
[root@pc-3 html]# echo "this is yellowdog" > test.html
2.3 Operation configuration of PC1
2.3.1 Install haproxy
[root@pc-2 opt]# yum install -y \
> pcre-devel \
> bzip2-devel \ //开启压缩功能
> gcc \
> gcc-c++ \
> make
tar zxvf haproxy-1.5.19.tar.gz
[root@pc-2 opt]# cd haproxy-1.5.19/
[root@pc-2 haproxy-1.5.19]# make TARGET=linux26
[root@pc-2 haproxy-1.5.19]# make install
2.3.2 Configure haproxy
[root@pc-2 examples]# cp haproxy.cfg /etc/haproxy/
[root@pc-2 examples]# cd /etc/haproxy
[root@pc-2 haproxy]# vim haproxy.cfg
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 4096
#chroot /usr/share/haproxy // 取消禁锢目录
uid 99
gid 99
daemon
#debug
#quiet
defaults
log global
mode http
option httplog
option dontlognull
retries 3
#redispatch // 取消对宕机的设备发送任务
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
listen webcluster 0.0.0.0:80
option httpchk GET /test.html
balance roundrobin
server inst1 192.168.100.3:80 check inter 2000 fall 3
server inst2 192.168.100.4:80 check inter 2000 fall 3
2.4: Detailed explanation of Haproxy configuration file
Haproxy配置文件通常分为三个部分
global:为全局配置
defaults:为默认配置
listen:为应用组件配置
global配置参数
log127.0.0.1 lcal0:配置日志记录,local0为日志设备,默认存放到系统日志
log127.0.0.1 loca1 notice:notice为日志级别,通常有24个级别
maxconn4096:最大连接数
uid 99:用户uid
gid 99:用户gid
defaults配置项配置默认参数,一般会被应用组件继承,如果在应用组件中没有特别声明,将安装默认配置参数设置
log global:定义日志为global配置中的日志定义
mode http:模式为http
option httplog:采用http日志格式记录日志
retries 3:检查节点服务器失败连续达到三次则认为节点不可用
maxconn2000:最大连接数
contimeout5000:连接超时时间
clitimeout50000:客户端超时时间
srvtimeout50000:服务器超时时间
listen配置项目一般为配置应用模块参数
listen appli4- backup 0.0.0.0:10004:定义一个appli4- backup的应用
option httpchk /index.html检查服务器的index.html文件
option persist:强制将请求发送到已经down掉的服务器
balance roundrobin:负载均衡调度算法使用轮询算法
server inst1 192.168.100.3:80 check inter 2000 fall 3:定义节点1
server inst2 192.168 100.4:80 check inter 2000 fall 3 backup:定义节点2
2.5 Open the nginx service of the two nodes
[root@pc-3 nginx-1.12.2]# nginx
[root@pc-3 nginx-1.12.2]# netstat -ntap | grep 80
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 22646/nginx: master
2.6 Access 192.168.100.20/test.html test on the client
Three: Haproxy log management
Haproxy's log is output to the syslog of the system by default, which is generally defined separately in the production environment
Defined method steps
3.1 Modify the log configuration options in the Haproxy configuration file and add the configuration:
log /dev/log local0 info
log /dev/log local0 notice
修改 rsyslog配置,将 Haproxy相关的配置独立定义到
haproxy.conf,并放到/etc/rsyslog.d/下
保存配置文件并重启 rsyslog服务,完成 rsyslog配置
vim /etc/haproxy/haproxy.cfg
global
log /dev/log local0 info
log /dev/log local0 notice
#log loghost local0 info
maxconn 4096
#chroot /usr/share/haproxy
3.2 Restart the service and add configuration files
root@pc-2 /]# service haproxy restart
Restarting haproxy (via systemctl): [ 确定 ]
[root@pc-2 /]# touch /etc/rsyslog.d/haproxy.conf
[root@pc-2 /]# vim /etc/rsyslog.d/haproxy.conf
if ($programname == 'haproxy' and $syslogseverity-text =='info' )
then -/var/log/haproxy/haproxy-info.log
&~
if ($programname == 'haproxy' and $syslogseverity-text == 'notice' )
then -/var/log/haproxy/haproxy-notice.log
&~
~
~
~
systemctl restart rsyslog.service
3.3 Visit the browser to view the generated log file
[root@pc-2 log]# systemctl restart rsyslog.service
[root@pc-2 log]# cd /var/log
Four: Detailed explanation of the parameters that can be optimized by Haproxy
With the increase of corporate website load,
haproxy parameter optimization is very important maxconn: the maximum number of connections, adjusted according to the actual situation of the application, it is recommended to use 10 240
daemon: daemon mode, Haproxy can be started in non-daemon mode, it is recommended to use daemon mode to start
nbproc: The number of concurrent processes for load balancing. It is recommended to be equal to or twice the number of CPU cores of the current server.
retries: The number of retries, mainly used to check cluster nodes. If there are many nodes and the amount of concurrency is large, set to 2 times Or 3 times
option http-server-close: actively close the http request option, it is recommended to use this option in the production environment
timeout http-keep-alive: long connection timeout time, set the long connection timeout time, can be set to 10s
timeout http-request : Http request timeout time, it is recommended to set this time to 5~10s to increase the http connection release speed.
timeout client: client timeout time, if the traffic is too large and the node response is slow, you can set this time shorter, it is recommended to set it to About 1min is fine