zabbix监控不报警的问题检查,添加zabbix分区表

      这几天DB主从的slave状态,一直莫名中断,查看报错发现是1062,解决这个问题之后,发现另一个问题,zabbix监控没有报警。反复手动关闭mysql的主从状态,就是不报警。

      这一套公司的zabbix监控安装运行有两年之久,初始化安装配置也是之前的同事配的,中间没人维护过,也没有留下什么文档。如今出了问题,只能自己着手解决。

       先查zabbix_agentd日志,报了一堆:

rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied
rm: cannot remove `/root/.bash_history': Permission denied

再查zabbix_server,通过zabbix_get进行测试,测试命令如下:

/usr/local/zabbix/bin/zabbix_get -s IP地址 -p10050 -k "MySQL.running-slave"

报下面这个错:

初步怀疑是zabbix_agentd的问题,于是重装,重装过程就不再说明了,我是用自己写的脚本装的。

-rw-r--r-- 1 root root 2450 2018/01/31 12:01:47 zabbix_lyb_agentd.sh

重装完成,配置好相关参数,查看zabbix_agentd的日志:

tailf /var/log/zabbix/zabbix_agentd.log

还是有报错

rm: cannot remove `/root/.bash_history': Permission denied

再测一波mysql主从状态,关闭了slave,然而zabbix监控还是一片平静,什么报警都没有。

再去server端去通过zabbix_get进行测试,发现能够返回值1,说明能获取到监控参数了。

[root@localhost ~]# /usr/local/zabbix/bin/zabbix_get -s 119.29.xxx.xxx -p10050 -k "MySQL.running-slave"  
1

这就跟客户端没什么关系了。于是再查看zabbix_server的日志tailf /tmp/zabbix_server.log,报错信息如下:

很明显,无法写入分区表了,看来需要重新建分区表。

查一下数据库的大小,这条SQL语句是在网上找的,果然数据库太大了,

select table_name as "Tables",round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB" from information_schema.TABLES where table_schema = 'zabbix' order by (data_length + index_length) desc;

写个创建分区表的存储过程,借鉴了网上的一些代码,自己稍作修改。编写存储过程代码如下,IP地址我做了处理,请自行修改:

DELIMITER $$

DROP PROCEDURE IF EXISTS `zabbix`.`partition_create`$$

CREATE DEFINER=`root`@`192.168.xxx.xxx` PROCEDURE `partition_create`(SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int)
BEGIN
 
        DECLARE RETROWS INT;
        SELECT COUNT(1) INTO RETROWS FROM information_schema.partitions WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND partition_description >= CLOCK;
        IF RETROWS = 0 THEN
        SET @sql = CONCAT( 'ALTER TABLE `', SCHEMANAME, '`.`', TABLENAME, '`',' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', CLOCK, '));' );
        PREPARE STMT FROM @sql;
        EXECUTE STMT;
        DEALLOCATE PREPARE STMT;
        END IF;
        END$$

DELIMITER ;

先停zabbix服务

/etc/init.d/zabbix_server stop


把需要添加分区表的那几张表统计出来,创建和之前表结构一致但带有分区表的数据表。所以记得先把之前的表结构保存出来。然后再执行下面的操作,我这里只以history为例:

use zabbix;

drop table history;   # 这一条是删表,有重要数据的请慎重执行

CREATE TABLE `history` (
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `value` double(16,4) NOT NULL DEFAULT '0.0000',
  `ns` int(11) NOT NULL DEFAULT '0',
  KEY `history_1` (`itemid`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
partition by range(clock) (
PARTITION p20180310 VALUES LESS THAN (UNIX_TIMESTAMP("2018-03-10 00:00:00"))
);

然后调用存储过程:

call zabbix.partition_create("zabbix","trends_uint","p20180410",UNIX_TIMESTAMP("2018-04-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20180510",UNIX_TIMESTAMP("2018-05-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20180610",UNIX_TIMESTAMP("2018-06-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20180710",UNIX_TIMESTAMP("2018-07-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20180810",UNIX_TIMESTAMP("2018-08-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20180910",UNIX_TIMESTAMP("2018-09-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20181010",UNIX_TIMESTAMP("2018-10-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20181110",UNIX_TIMESTAMP("2018-11-10 00:00:00"));
call zabbix.partition_create("zabbix","trends_uint","p20181210",UNIX_TIMESTAMP("2018-12-10 00:00:00"));

#UNIX_TIMESTAMP 一般是用于unix的时间戳

建议你多添加几条免得以后麻烦,我是添加了一年的,直接在命令里改日期就可以了。

这样重新清理的表包括history、history_uint、history_str、trends、trends_uint...

这些表都是zabbix_server的日志报出来的,

最后,开启zabbix服务

/etc/init.d/zabbix_server start

再查看zabbix_server的日志,没报错了:

[root@localhost ~]# tailf /tmp/zabbix_server.log   
  5209:20180201:153427.380 server #23 started [history syncer #4]
  5211:20180201:153427.380 server #25 started [proxy poller #1]
  5190:20180201:153427.396 server #4 started [poller #2]
  5191:20180201:153427.396 server #5 started [poller #3]
  5193:20180201:153427.396 server #7 started [poller #5]
  5189:20180201:153427.399 server #3 started [poller #1]
  5192:20180201:153427.399 server #6 started [poller #4]
  5194:20180201:153427.400 server #8 started [unreachable poller #1]
  5205:20180201:153427.415 server #19 started [discoverer #1]
  5202:20180201:153427.605 housekeeper [deleted 0 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 0.234616 sec, idle 1 hour(s)]


查看zabbix报警的情况,收到报警信息。成了。

猜你喜欢

转载自blog.csdn.net/fangfu123/article/details/79227435