Table of contents
Build related servers
Plan the IP address and cluster architecture diagram, and turn off all SELINUX and firewalls
# 关闭SELINUX
sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config
# 关闭防火墙
service firewalld stop
systemctl disable firewalld
web1 | 192.168.40.21 | backend web |
---|---|---|
web2 | 192.168.40.22 | backend web |
web3 | 192.168.40.23 | backend web |
lb1 | 192.168.40.31 | load balancer 1 |
lb2 | 192.168.40.32 | load balancer 2 |
dns、prometheus | 192.168.40.137 | DNS server, monitoring server |
nfs | 192.168.40.138 | NFS server |
DNS server configuration
Install bind package
yum install bind* -y
Start the named process
[root@elk-node2 selinux]# service named start
Redirecting to /bin/systemctl start named.service
[root@elk-node2 selinux]# ps aux | grep named
named 44018 0.8 3.2 391060 60084 ? Ssl 15:15 0:00 /usr/sbin/named -u named -c /etc/named.conf
root 44038 0.0 0.0 112824 980 pts/0 S+ 15:15 0:00 grep --color=auto named
Modify the /etc/resolv.conf file, add a line, and set the domain name server to this machine
nameserver 127.0.0.1
test. Parsed successfully
[root@elk-node2 selinux]# nslookup
> www.qq.com
Server: 127.0.0.1
Address: 127.0.0.1#53
Non-authoritative answer:
www.qq.com canonical name = ins-r23tsuuf.ias.tencent-cloud.net.
Name: ins-r23tsuuf.ias.tencent-cloud.net
Address: 121.14.77.221
Name: ins-r23tsuuf.ias.tencent-cloud.net
Address: 121.14.77.201
Name: ins-r23tsuuf.ias.tencent-cloud.net
Address: 240e:97c:2f:3003::77
Name: ins-r23tsuuf.ias.tencent-cloud.net
Address: 240e:97c:2f:3003::6a
Use this machine as a domain name server so that other machines can access it and modify /etc/named.conf
listen-on port 53 {
any; };
listen-on-v6 port 53 {
any; };
allow-query {
any; };
Restart service
service named restart
In this way, other machines can use the machine 192.168.40.137 for domain name resolution.
WEB server configuration
Configure static IP
Enter /etc/sysconfig/network-scripts/
directory
Modify the ifcfg-ens33 file to ensure communication with each other
web1IP configuration
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.40.21
PREFIX=24
GATEWAY=192.168.40.2
DNS1=114.114.114.114
web2IP configuration
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.40.22
PREFIX=24
GATEWAY=192.168.40.2
DNS1=114.114.114.114
web3IP configuration
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.40.23
PREFIX=24
GATEWAY=192.168.40.2
DNS1=114.114.114.114
Compile and install nginx
To compile and install nginx, you can read my blog about starting and stopping the installation of Nginx.
After installation, browser access can be successful.
Load balancer configuration
Use nginx for load balancing
lb1
Configure static IP
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.40.31
PREFIX=24
GATEWAY=192.168.40.2
DNS1=114.114.114.114
Modify the files in the installation directory nginx.conf
and add the following
Layer 7 load—>upstream is in the http block and is forwarded based on the http protocol.
http {
……
upstream lb1{
# 后端真实的IP地址,在http块里
ip_hash; # 使用ip_hash算法或least_conn;最小连接
# 权重 192.168.40.21 weight=5;
server 192.168.40.21;;
server 192.168.40.22;
server 192.168.40.23;
}
server {
listen 80;
……
location / {
#root html; 注释掉,因为只是做代理,不是直接访问
#index index.html index.htm;
proxy_pass http://lb1; # 代理转发
}
Layer 4 load—>stream block is at the same level as http block, forwarding based on IP+port
stream {
upstream lb1{
}
server {
listen 80; # 基于80端口转发
proxy_pass lb1;
}
upstream dns_servers {
least_conn;
server 192.168.40.21:53;
server 192.168.40.22:53;
server 192.168.40.23:53;
}
server {
listen 53 udp; # 基于53端口转发
proxy_pass dns_servers;
}
}
Reload nginx
nginx -s reload
The polling algorithm is used by default, you can check the effect
lb2
Configure static IP
BOOTPROTO="none"
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.40.32
PREFIX=24
GATEWAY=192.168.40.2
DNS1=114.114.114.114
Modify the files in the installation directory nginx.conf
and add the following
Layer 7 load—>upstream is in the http block and is forwarded based on the http protocol.
http {
……
upstream lb2{
# 后端真实的IP地址,在http块里
ip_hash; # 使用ip_hash算法或least_conn;最小连接
# 权重 192.168.40.21 weight=5;
server 192.168.40.21;;
server 192.168.40.22;
server 192.168.40.23;
}
server {
listen 80;
……
location / {
#root html; 注释掉,因为只是做代理,不是直接访问
#index index.html index.htm;
proxy_pass http://lb2; # 代理转发
}
Reload nginx
nginx -s reload
Problem: The back-end server does not know the actual accessed IP address, but only the IP address of the load balancer. How to solve it?
Use the variable $remote_addr
to get the client's IP address, assign it to X-Real-IP
the field, and then reload it
nginx -s reload
All backend server modification logs to obtain the value of this field
Check whether the real IP of the client is obtained
Question: What is the difference between Layer 4 load and Layer 7 load?
-
四层负载均衡(Layer 4 Load Balancing)
: Four-layer load balancing is a way to perform load balancing on the transport layer (that is, the network layer). In four-layer load balancing, the load balancing device源IP地址、目标IP地址、源端口号、目标端口号
forwards the request to the corresponding server based on other information. It basically only focuses on the basic properties of the network connection and has no knowledge of the content and protocol of the request.The advantages of four-layer load balancing are fast speed and high efficiency, and it is suitable for handling a large number of network connections, such as TCP and UDP protocols. However, it has a limited understanding of the content of the request and cannot customize forwarding strategies for the specific needs of specific applications.
-
七层负载均衡(Layer 7 Load Balancing)
: Layer-7 load balancing is a way of load balancing on the application layer. In seven-layer load balancing, the load balancing device can go deep into the application layer protocol (such as HTTP, HTTPS) to understand the content and characteristics of the request, and intelligently forward the request based on the requestURL、请求头、会话信息等因素
.Seven-layer load balancing can achieve more flexible and customized forwarding strategies. For example, requests can be distributed to different backend servers based on domain names, URL paths, specific information in request headers, etc. This is useful when dealing with web applications, API services, etc. that have specific routing rules and requirements.
Four-layer load balancing mainly forwards based on the network connection attributes of the transport layer, which is suitable for high concurrency and large-scale network connection scenarios; while seven-layer load balancing provides an in-depth understanding of requests at the application layer and is suitable for forwarding based on request content and characteristics. Scenarios for intelligent forwarding. In actual applications, depending on specific needs and application types, you can choose an appropriate load balancing method or combine the two to achieve better performance and scalability.
High availability configuration
Use keepalived to achieve high availability
Both load balancers are installed with keepalived, and the communication between them is through the VRRP protocol. Reference article for introduction to the VRRP protocol.
yum install keepalived
Single VIP configuration
Enter the directory where the configuration file is located /etc/keepalived/
, edit the configuration file keepalived.conf, and start a vrrp instance.
lb1 configuration
vrrp_instance VI_1 {
#启动一个实例
state MASTER #角色为master
interface ens33 #网卡接口
virtual_router_id 150#路由id
priority 100 #优先级
advert_int 1 #宣告信息 间隔1s
authentication {
#认证信息
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
#虚拟IP,对外提供服务
192.168.40.51
}
}
lb2 configuration
vrrp_instance VI_1 {
state BACKUP #角色为backup
interface ens33
virtual_router_id 150
priority 50 #优先级比master要低
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.51
}
}
Start keepalived, you can see vip on the load balancer with high priority
service keepalived start
Double VIP configuration
Enter the directory where the configuration file is located /etc/keepalived/
, edit the configuration file keepalived.conf, and start two vrrp to provide external services to improve usage.
lb1 configuration
vrrp_instance VI_1 {
#启动一个实例
state MASTER #角色为master
interface ens33 #网卡接口
virtual_router_id 150#路由id
priority 100 #优先级
advert_int 1 #宣告信息
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.51 # 对外提供的IP
}
}
vrrp_instance VI_2 {
#启动第二个实例
state BACKUP #角色为backup
interface ens33 #网卡接口
virtual_router_id 160#路由id
priority 50 #优先级
advert_int 1 #宣告信息
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.52 # 对外提供的IP
}
}
lb2 configuration
vrrp_instance VI_1 {
state BACKUP #角色为backup
interface ens33
virtual_router_id 150
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.51
}
}
vrrp_instance VI_2 {
state MASTER #角色为master
interface ens33
virtual_router_id 160
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.52
}
}
Restart keepalived and you can see VIP on both load balancers.
service keepalived start
Write a script check_nginx.sh
to monitor whether Nginx is running. If Nginx hangs, there is no point in turning on keepalived. It takes up resources and requires timely adjustment of the active and standby status.
#!/bin/bash
if [[ $(netstat -anplut| grep nginx|wc -l) -eq 1 ]];then
exit 0
else
exit 1
# 关闭keepalived
service keepalived stop
fi
Granted permission
chmod +x check_nginx.sh
The script did not execute successfully. There was a problem when checking the /var/log/messages log. It turned out that there was no space between the script name and the brackets...
lb1 configuration after adding script
! Configuration File for keepalived
global_defs {
notification_email {
[email protected]
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_nginx {
script "/etc/keepalived/check_nginx.sh" # 外部脚本执行位置,使用绝对路径
interval 1
weight -60 # 修改后权重的优先值要小于backup
}
vrrp_instance VI_1 {
#启动一个实例
state MASTER
interface ens33 #网卡接口
virtual_router_id 150#路由id
priority 100 #优先级
advert_int 1 #宣告信息
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.51
}
track_script {
# 脚本名要有空格
chk_nginx # 调用脚本
}
}
vrrp_instance VI_2 {
#启动一个实例
state BACKUP
interface ens33 #网卡接口
virtual_router_id 170#路由id
priority 50 #优先级
advert_int 1 #宣告信息
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.40.52
}
}
The configuration of lb2 is the same as lb1. Just place the code executed by the script in the master part of lb2.
进行nginx测试,发现双vip能够在nginx关闭的状态同时关闭keepalived并进行vip漂移
Reference article for usage of notify (can also achieve the effect of keepalived closing)
notify的用法:
notify_master:当前节点成为master时,通知脚本执行任务(一般用于启动某服务,比如nginx,haproxy等)
notify_backup:当前节点成为backup时,通知脚本执行任务(一般用于关闭某服务,比如nginx,haproxy等)
notify_fault:当前节点出现故障,执行的任务;
例:当成为master时启动haproxy,当成为backup时关闭haproxy
notify_master "/etc/keepalived/start_haproxy.sh start"
notify_backup "/etc/keepalived/start_haproxy.sh stop"
Question: What is the split brain phenomenon and what are the possible causes?
The split-brain phenomenon refers to 主备服务器之间的通信故障
a situation that causes two nodes to think that they are the master node at the same time and compete for the following reasons:
- Network partition: In a cluster using keepalived, if the network is partitioned and the communication between the master node and the backup node is interrupted, a split-brain phenomenon may occur.
- Inconsistent virtual route IDs: Virtual route IDs are used to uniquely identify the active and backup nodes. If the virtual route ID settings are inconsistent, conflicts will occur between different nodes, which may cause the nodes to announce themselves as active nodes at the same time, causing a split-brain phenomenon.
- 认证密码不一样:当认证密码不一致时,节点之间的通信将受阻,可能导致节点无法正常进行状态同步和故障切换,从而引发脑裂现象的发生。
- 节点运行状态不同步:当主节点和备份节点之间的状态同步过程中出现错误或延迟,导致节点状态不一致,可能会引发脑裂现象。
- 信号丢失:keepalived使用心跳机制检测节点状态,如果由于网络延迟或其他原因导致心跳信号丢失,可能会误判节点状态,从而引发脑裂现象。
问题:keepalived的三个进程?
- Keepalived 主进程:负责加载并解析 Keepalived 配置文件,创建和管理 VRRP 实例,并监控实例状态。它还处理与其他 Keepalived 进程之间的通信。
- Keepalived VRRP 进程:这是负责实现虚拟路由冗余协议功能的进程。每个启动的 VRRP 实例都会有一个对应的 VRRP 进程。它负责定期发送 VRRP 通告消息,监听其他节点发送的通告消息,并根据配置的优先级进行故障转移。
- Keepalived Check Script 进程:这个进程用于执行用户定义的健康检查脚本。通过此进程,可以执行自定义的脚本来检测服务器的健康状态,并根据脚本的返回结果来更改 VRRP 实例的状态或触发故障转移。
NFS服务器配置
使用nfs,让后端服务器到nfs服务器里获取数据,将nfs的服务器挂载到web服务器上,保证数据一致性
。
配置静态IP
BOOTPROTO="none"
IPADDR=192.168.40.138
GATEWAY=192.168.40.2
DNS2=114.114.114.114
NAME="ens33"
DEVICE="ens33"
ONBOOT="yes"
安装软件包
yum -y install rpcbind nfs-utils
启动服务,先启动rpc服务,再启动nfs服务
# 启动rpc服务
[root@nfs ~]# service rpcbind start
Redirecting to /bin/systemctl start rpcbind.service
[root@nfs ~]# systemctl enable rpcbind
# 启动nfs服务
[root@nfs ~]# service nfs-server start
Redirecting to /bin/systemctl start nfs-server.service
[root@nfs ~]# systemctl enable nfs-server
新建共享目录
新建/data/share/
,自己写一个index.html查看效果
mkdir -p /data/share/
编辑配置文件vim /etc/exports
/data/share/ 192.168.40.0/24(rw,no_root_squash,all_squash,sync)
其中:
/data/share/
:共享文件目录192.168.40.0/24
:表示接受来自以192.168.40.0
开头的IP地址范围的请求。(rw)
:指定允许对目录进行读写操作。no_root_squash
:指定不对root
用户进行权限限制。它意味着在客户端上以root
用户身份访问时,在服务器上也将以root
用户身份进行访问。all_squash
:指定将所有用户映射为匿名用户。它意味着在客户端上以任何用户身份访问时,在服务器上都将以匿名用户身份进行访问。sync
:指定文件系统同步方式。sync
表示在写入操作完成之前,将数据同步到磁盘上。保障数据的一致性和可靠性,但可能会对性能产生影响。
重新加载nfs,让配置文件生效
systemctl reload nfs
exportfs -rv
web服务器挂载
3台web服务器只需要安装rpcbind服务即可,无需安装nfs或开启nfs服务。
yum install rpcbind -y
web服务器端查看nfs服务器共享目录
[root@web1 ~]# showmount -e 192.168.40.138
Export list for 192.168.40.138:
/data/share 192.168.40.0/24
[root@web2 ~]# showmount -e 192.168.40.138
Export list for 192.168.40.138:
/data/share 192.168.40.0/24
[root@web3 ~]# showmount -e 192.168.40.138
Export list for 192.168.40.138:
/data/share 192.168.40.0/24
进行挂载,挂载到Nginx网页目录下
[root@web1 ~]# mount 192.168.40.138:/data/share /usr/local/shengxia/html
[root@web2 ~]# mount 192.168.40.138:/data/share /usr/local/shengxia/html
[root@web3 ~]# mount 192.168.40.138:/data/share /usr/local/shengxia/html
设置开机自动挂载nfs文件系统
vim /etc/rc.local
# 将这行直接接入/etc/rc.local文件末尾
mount -t nfs 192.168.40.138:/data/share /usr/local/shengxia/html
同时给/etc/rc.d/rc.local
可执行权限
chmod /etc/rc.d/rc.local
看到这个效果就表示成功了
监控服务器配置
下载prometheus和exporter进行监控,安装可以看我这篇博客
Prometheus、Grafana、cAdvisor的介绍、安装和使用
安装node-exporter
prometheus安装好之后,在每个服务器都安装node-exporter,监控服务器状态 下载
除了本机192.168.40.137以外,所有的服务器都下载,演示一个案例。其他服务器相同操作
解压文件
[root@web1 exporter]# ls
node_exporter-1.5.0.linux-amd64.tar.gz
[root@web1 exporter]# tar xf node_exporter-1.5.0.linux-amd64.tar.gz
[root@web1 exporter]# ls
node_exporter-1.5.0.linux-amd64 node_exporter-1.5.0.linux-amd64.tar.gz
新建目录
[root@web1 exporter]# mkdir -p /node_exporter
复制node_exporter
下的文件到指定的目录
[root@web1 exporter]# cp node_exporter-1.5.0.linux-amd64/* /node_exporter
在/root/.bashrc
文件下修改PATH环境变量,将这行加到文件末尾,刷新一下
PATH=/node_exporter/:$PATH
source /root/.bashrc
放到后台启动运行
[root@web1 exporter]# nohup node_exporter --web.listen-address 192.168.40.21:8899 &
出现这个页面即成功
编写prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["192.168.40.137:9090"]
- job_name: "nfs"
static_configs:
- targets: ["192.168.40.138:8899"]
- job_name: "lb1"
static_configs:
- targets: ["192.168.40.31:8899"]
- job_name: "lb2"
static_configs:
- targets: ["192.168.40.32:8899"]
- job_name: "web1"
static_configs:
- targets: ["192.168.40.21:8899"]
- job_name: "web2"
static_configs:
- targets: ["192.168.40.22:8899"]
- job_name: "web3"
static_configs:
- targets: ["192.168.40.23:8899"]
重新启动prometheus
[root@dns-prom prometheus]# service prometheus restart
看到这个页面就表示监控成功了
安装alertmanager和钉钉插件
下载
[root@dns-prom prometheus]# wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
[root@dns-prom prometheus]# wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
解压
[root@dns-prom prometheus]# tar xf alertmanager-0.23.0-rc.0.linux-amd64.tar.gz
[root@dns-prom prometheus]# mv alertmanager-0.23.0-rc.0.linux-amd64 alertmanager
[root@dns-prom prometheus]# tar xf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
[root@dns-prom prometheus]# mv prometheus-webhook-dingtalk-1.4.0.linux-amd64 prometheus-webhook-dingtalk
获取机器人webhook
获取允许访问的IP,使用curl ifconfig.me
可以获得
[root@dns-prom alertmanager]# curl ifconfig.me
222.244.215.17
Modify DingTalk alert template
#位置:/lianxi/prometheus/prometheus-webhook-dingtalk/contrib/templates/legacy/template.tmpl
[root@dns-prom legacy]# cat template.tmpl
{
{
define "ding.link.title" }}{
{
template "legacy.title" . }}{
{
end }}
{
{
define "ding.link.content" }}
{
{
if gt (len .Alerts.Firing) 0 -}}
告警列表:
{
{
template "__text_alert_list" .Alerts.Firing }}
{
{
- end }}
{
{
if gt (len .Alerts.Resolved) 0 -}}
恢复列表:
{
{
template "__text_resolve_list" .Alerts.Resolved }}
{
{
- end }}
{
{
- end }}
Modify the cofig and yml files, add the robot's webhook token, and specify the template file
[root@dns-prom prometheus-webhook-dingtalk]# cat config.yml
templates:
- /lianxi/prometheus/prometheus-webhook-dingtalk/contrib/templates/legacy/template.tmpl # 模板路径
targets:
webhook2:
url: https://oapi.dingtalk.com/robot/send?access_token=你自己的token
will be prometheus-webhook-dingtalk
registered as a service
[root@dns-prom system]# pwd
/usr/lib/systemd/system
[root@dns-prom system]# cat webhook-dingtalk
[Unit]
Description=prometheus-webhook-dingtalk
Documentation=https://github.com/timonwong/prometheus-webhook-dingtalk
After=network.target
[Service]
ExecStart=/lianxi/prometheus/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/lianxi/prometheus/prometheus-webhook-dingtalk/config.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
Loading services
[root@dns-prom system]# systemctl daemon-reload
start service
[root@dns-prom system]# service webhook-dingtalk start
Redirecting to /bin/systemctl start webhook-dingtalk.service
Write alertmanager
Modify alertmanager.yml
files
global:
resolve_timeout: 5m
route: # 告警路由配置,定义如何处理和发送告警
receiver: webhook
group_wait: 30s
group_interval: 1m
repeat_interval: 4h
group_by: [alertname]
routes:
- receiver: webhook
group_wait: 10s
receivers: # 告警接收者配置,定义如何处理和发送告警
- name: webhook
webhook_configs:
### 注意注意,我在dingtalk的配置文件里用的是webhook2,要对上
- url: http://192.168.40.137:8060/dingtalk/webhook2/send # 告警 Webhook URL
send_resolved: true # 是否发送已解决的告警。如果设置为 true,则在告警解决时发送通知
will be alertmanager
registered as a service
[Unit]
Description=alertmanager
Documentation=https://prometheus.io/
After=network.target
[Service]
ExecStart=/lianxi/prometheus/alertmanager/alertmanager --config.file=/lianxi/prometheus/alertmanager/alertmanager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
Loading services
[root@dns-prom system]# systemctl daemon-reload
Check
Set alarm file
Create an alarm rule in the rules.yml file in the prometheus directory.
[root@dns-prom prometheus]# pwd
/lianxi/prometheus/prometheus
[root@dns-prom prometheus]# cat rules.yml
groups:
- name: host_monitoring
rules:
- alert: 内存报警
expr: netdata_system_ram_MiB_average{
chart="system.ram",dimension="free",family="ram"} < 800
for: 2m
labels:
team: node
annotations:
Alert_type: 内存报警
Server: '{
{$labels.instance}}'
explain: "内存使用量超过90%,目前剩余量为:{
{ $value }}M"
- alert: CPU报警
expr: netdata_system_cpu_percentage_average{
chart="system.cpu",dimension="idle",family="cpu"} < 20
for: 2m
labels:
team: node
annotations:
Alert_type: CPU报警
Server: '{
{$labels.instance}}'
explain: "CPU使用量超过80%,目前剩余量为:{
{ $value }}"
- alert: 磁盘报警
expr: netdata_disk_space_GiB_average{
chart="disk_space._",dimension="avail",family="/"} < 4
for: 2m
labels:
team: node
annotations:
Alert_type: 磁盘报警
Server: '{
{$labels.instance}}'
explain: "磁盘使用量超过90%,目前剩余量为:{
{ $value }}G"
- alert: 服务告警
expr: up == 0
for: 2m
labels:
team: node
annotations:
Alert_type: 服务报警
Server: '{
{$labels.instance}}'
explain: "netdata服务已关闭"
Modify the prometheus.yml file and alertmanager
associate it with
alerting:
alertmanagers:
- static_configs:
- targets: ["192.168.40.137:9093"]
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/lianxi/prometheus/prometheus/rules.yml" # 告警模板路径
# - "first_rules.yml"
# - "second_rules.yml"
Restart the prometheus service
[root@dns-prom prometheus]# service prometheus restart
You can see the monitoring data
Simulate server downtime, close web1, and prompt an alarm
DingTalk received an alert
Install grafana
Download the Grafana software package from the Grafana official website and install it according to the official documentation
root@dns-prom grafana]# yum install -y https://dl.grafana.com/enterprise/release/grafana-enterprise-9.5.1-1.x86_64.rpm
Start grafana
[root@dns-prom grafana]# service grafana-server restart
Restarting grafana-server (via systemctl): [ 确定 ]
For the specific operation process, please see this document Introduction, Installation and Use of Prometheus, Grafana, and cAdvisor
Just choose a good template and you can display it.
Conduct stress testing
Install ab software and simulate requests
yum install ab -y
Continuously simulate requests to understand the number of cluster concurrency.