To monitor the status of realization of MongoDB replica set an example, see Telegraf Exec input system how to write plug-ins to deploy

Telegraf on existing MongoDB input plug difficult to monitor the state of the replica set node, node status replica set has PRIMARY, SECONDARY, RECOVERYING, ARBITER so on. Now we try  Exec Input Plugin to achieve the monitoring of the state of MongoDB.

The first part of Zabbix monitoring and control system under review

1. File script Zabbix performed are as follows:

#!/bin/bash
command_linebin="mongodb可执行文件路径/bin/mongo"
replstatus="PRIMARY SECONDARY ARBITER"
username="user"
password="pwd"

command_line="${command_linebin} localhost:$1/admin -u$username -p$password"
mcount=$(/bin/echo "rs.status().members.length" | $command_line --quiet)
if [ $mcount -ge 3 ]  ;then 
   ms1=$(/bin/echo "rs.status().members[0].stateStr" | $command_line --quiet)
   ms2=$(/bin/echo "rs.status().members[1].stateStr" | $command_line --quiet)
   ms3=$(/bin/echo "rs.status().members[2].stateStr" | $command_line --quiet)


  if [[ $replstatus =~ $ms1 ]] && [[ $replstatus =~ $ms2 ]]  && [[ $replstatus =~ $ms3 ]] ;then 
        echo "The Status OK "
   else 
         echo "The status of mongo replica is unnormal.port is " $1 
   fi 

else
     ms1=$(/bin/echo "rs.status().members[0].stateStr" | $command_line --quiet)
     ms2=$(/bin/echo "rs.status().members[1].stateStr" | $command_line --quiet)
    
    if [[ $replstatus =~ $ms1 ]] && [[ $replstatus =~ $ms2 ]] ;then
        echo "The Status OK "
   else
         echo "The status of mongo replica is unnormal.port is " $1 
   fi


fi

 (Execute input parameter file is MongoDB port number)

2. abnormal alarm interface;

The second part of the collection agent to achieve Telegraf on exec plugin

1. The executable file is named test_mongodb.sh, specific script to simplify adjustment

#!/bin/bash
command_linebin="MongoDB可执行文件/bin/mongo"
username="User"
replstatus="PRIMARY SECONDARY ARBITER"
password="PWD"
port=27017

command_line="${command_linebin} localhost:$port/admin -u$username -p$password"
mcount=$(/bin/echo "rs.status().members.length" | $command_line --quiet)
if [ $mcount -ge 3 ]  ;then 
   ms1=$(/bin/echo "rs.status().members[0].stateStr" | $command_line --quiet)
   ms2=$(/bin/echo "rs.status().members[1].stateStr" | $command_line --quiet)
   ms3=$(/bin/echo "rs.status().members[2].stateStr" | $command_line --quiet)


  if [[ $replstatus =~ $ms1 ]] && [[ $replstatus =~ $ms2 ]]  && [[ $replstatus =~ $ms3 ]] ;then 
        echo "ReplStatus,tag=mongodb Status=1i"
   else 
         echo "ReplStatus,tag=mongodb Status=2i" 
   fi 

else
     ms1=$(/bin/echo "rs.status().members[0].stateStr" | $command_line --quiet)
     ms2=$(/bin/echo "rs.status().members[1].stateStr" | $command_line --quiet)
    
    if [[ $replstatus =~ $ms1 ]] && [[ $replstatus =~ $ms2 ]] ;then
        echo "ReplStatus,tag=mongodb Status=1i"
   else
          echo 'ReplStatus,tag=mongodb Status=2i'
  
   fi


fi

 Path of the file which is /etc/telegraf/test_mongodb.sh

2. telegraf.conf on exec input configured as follows;

3. test by test commands, operating normally

 --config telegraph telegraf.conf --test

4. Start Services

service telegraf start

Verify query data collected to 5. Sign InfluxDB

 (Above seemingly deployed, actual .....)

Part III permissions problem cause analysis and exploration of data errors

A close look at the screenshot above data, the problem came out, obviously the debugging process is status is 1 , and saved to InfluxDB the status 2 of it? ? ?

Originally cluster status is OK, the result judged Error! ! ! ! For this reason the issue in the online search and programs, focusing as follows:

That is suspected permissions issue.

The following is a testing process specific inquiry

step 1 we switch from the root account to the account telegraf

But switching fails.

Step 2 View telegraf account information

cat /etc/passwd

(/ Bin / false does nothing except return an error status, and then quit immediately. The user's shell is set to / bin / false, users can not log on, and will not have any tips.)

Step 3 The property to sign / bin / bash, modified as follows;

 step 4 su again as telegraf account, checking an account currently connected

 

At this time, switched success.

Step 5 In telegraf account test 

At this point status = 2 and we start this test (the root account Status = 1 ) is not the same, but also to understand why as the source of the data InfluxDB 2, and indeed about the account.

Step 6 test_mongodb.sh file called debug telegraf

sh -x test_mongodb.sh

部分报错信息如下:

+ ms2='SECONDARY
2019-07-02T20:24:59.596+0800 E -        [main] Error saving history file: FileOpenFailed: Unable to open() file /etc/telegraf/.dbshell: Unknown error'
+ [[ PRIMARY SECONDARY ARBITER =~ PRIMARY
2019-07-02T20:24:59.468+0800 E -        [main] Error saving history file: FileOpenFailed: Unable to open() file /etc/telegraf/.dbshell: Unknown error ]]
+ echo 'ReplStatus,tag=mongodb Status=2i'
ReplStatus,tag=mongodb Status=2i

从上面的异常信息得知,直接错误原因是 不能在文件 /etc/telegraf/.dbshell 中保存历史数据。

其实在/etc/telegraf 没有看到这个 .dbshell 这个文件。查看显示所有的文件(包含隐藏文件)的 命令如下

ll -a

 那么如果我们手动创建.dbshell呢? 

以下操作是在root账号下创建.dbshell 文件,并且将文件的拥有者调整为telegraf账号,具体的的命令如截图:

step 7 再次用telegraf账号运行  test_mongodb.sh 可执行文件

 sh -x test_mongodb.sh 

 此时没有出现异常,数据也正常了。

 step 8 在telegraf账号下 用 telegraf 服务调试,此次运行和root账号一样了。

此时没有出现异常,数据也正常了。)

 Step 9 登入到InfluxDB中查看,由错误的2恢复了1。

 

第四部分 思考优化

(1)为什么root账号不需要手动创建.dbshel? 是不需要这个文件吗?如果需要有在那个目录下呢?

首先了解下 .dbshell 文件,它的基本解释是:“When you run the mongo client, it stores a history of commands in $HOME/.dbshell.”

从上面的解释可以看出,root账号应该也会产生.dbshell文件。

我们通过以下几步去找到root账号对应的.dbsehll 文件。

(2)如何优化telegraf的账号,不再需要手动创建文件?

 推测:是不是给telegraf 账号赋予自身对应$HOME足够的权限就可以了。

测试如下;

step 1 找到账号对应的$home

step 2 删除前面测试创建的.dbshell 文件

Step 3 执行验证telegraf 测试命令,此时应该数据异常(切记;切换到telegraf 账号下执行)

step 4 在root账号将 $home 对应的目录(/etc/telegaf)拥有者转换给telegraf账号

step 5 再次执行 telegraf 测试命令

 

以上,测试验证数据给账号相应的$home文档服务相应的权限,也可以解决问题。

第五部分  告警展示

我们在Grafana中配置后,其显示界面如下:

 

第六部分 补充说明;

1.$HOME代表的路径是什么?

可以简单的理解为 :home是用户的主目录,登录后缺省进入的目录,提供一个用户专属的启动文件来定义该用户所用到的一些环境变量。

可用命令:

echo $HOME

 

2.写入InfluxDB,应遵循以下格式:

<measurement>[,<tag-key>=<tag-value>...] <field-key>=<field-value>[,<field2-key>=<field2-value>...] [unix-nano-timestamp]

如果格式不对,常见的错误如下:

2019-05-02T08:04:13Z E! Error in plugin [inputs.exec]: metric parse error: expected field at offset 37: "XXXXXXXXXXXXXXXXXXXX\n"

 

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!

 

Guess you like

Origin www.cnblogs.com/xuliuzai/p/11121086.html