Requirement: Monitor the memory usage of all Java processes in the cluster.
Check which java processes are running in the linux system: jps command
[root@localhost zabbix]# jps
26490 YarnTaskExecutorRunner
12012 NodeManager
14047 YarnTaskExecutorRunner
25007 Jps
View the memory usage of the java process: jstat command -gc -gcutil
[root@node035 zabbix]# jstat -gc 12012
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
2560.0 2560.0 0.0 2208.0 335872.0 180374.6 338432.0 57522.0 51624.0 50525.8 5808.0 5542.1 104079 881.980 3 0.384 882.364
[root@node035 zabbix]# jstat -gcutil 12012
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 86.25 89.43 17.00 97.87 95.42 104079 881.980 3 0.384 882.364
#######################################################################
Collect data script:
Here it is best to use the grep command to filter out all the java processes you want to monitor, do not use the grep -v exclusion method
Because some processes may go from generation to destruction faster than you use the jps command, such as jps, jstat, jmap and other commands, so you may use jps | grep to obtain only one process pid number, but not the process name and other information.
It will cause your script to freeze occasionally, and then there will be a series of problems in data acquisition
For the situation that there may be multiple processes with the same process name, such as parent and child processes, etc.
The script here uses the method of marking the same process name, such as kafka, kafka1, kafka2....
Because zabbix uses the automatic discovery method to obtain the process name, I have tried to use the method of process name + pid to obtain it, but the pid will change.
So there is currently no good way to separate two methods with the same process name
However, the significance of monitoring is to observe the trend changes of monitoring items. If you see an abnormal memory status of a process, go to the data file of our script to get the pid according to the process name
[root@node031 monitor]# cat getJavaMemoryStatus.sh
#!/bin/bash
# Final output
output=""
# Variables
flag=1
last_name=""
currnet_name=""
# JPS Command
result=`/usr/local/jdk/bin/jps | egrep "QuorumPeerMain|Kafka|CanalAdminApplication|CanalLauncher|JournalNode|DFSZKFailoverController|NameNode|DataNode|ResourceManager|NodeManager|YarnJobClusterEntrypoint|YarnTaskExecutorRunner|HMaster|HRegion" | sort -k2 -k1`
# Main Loop
#echo "$result" | while read -r pid name ; do
while read -r pid name ; do
#echo "${pid},${name},${last_name}"
# Add num to same process name, for example: Process1, Process2 ...
if [ x"$name" = x"$last_name" ]; then
currnet_name="$name$flag"
flag=$(( $flag + 1 ))
else
currnet_name="$name"
flag=1
fi
last_name="$name"
# Get GC Status
res_gc=`/usr/local/jdk/bin/jstat -gc $pid 2>/dev/null | awk 'NR==2{print $1, $2, $3, $4, $5, $6, $7, $8}'`
res_gcutil=`/usr/local/jdk/bin/jstat -gcutil $pid 2>/dev/null | awk 'NR==2{print $1, $2, $3, $4, $7, $8, $9, $10}'`
# Combime output
if [ x"$output" = x"" ]; then
output="${currnet_name} $pid ${res_gc} ${res_gcutil}"
else
output+=$'\n'"${currnet_name} $pid ${res_gc} ${res_gcutil}"
fi
#echo "$output"
done <<< "$result"
# Output
echo "$output" > /tmp/java_memory_status.txt
Script optimization:
Get the data command to run only once as much as possible to reduce server pressure
Try not to read files when fetching data to reduce IO
Process auto-discovery script:
[root@localhost parameter_script]# cat java_discovery.sh
#!/bin/bash
javaProcessList=`cat /tmp/java_memory_status.txt|awk '{print $2"#"$1}'`
echo "{\"data\":["
first=1
for javaProcess in $javaProcessList;
do
IFS='#' read -r -a items <<< "$javaProcess";
if [ $first == 1 ]; then
echo "{\"{#JAVAPSNAME}\":\"${items[1]}\",\"{#JAVAPSPID}\":\"${items[0]}\"}";
first=0
else
echo ",{\"{#JAVAPSNAME}\":\"${items[1]}\",\"{#JAVAPSPID}\":\"${items[0]}\"}";
fi
done;
echo "]}";
#######################################################################
Get java process memory data script:
[root@node031 parameter_script]# cat getjavastatus.sh
#!/bin/bash
pid=`cat /tmp/java_memory_status.txt | awk '{print $2}'`
case $2 in
# S0总大小
S0C)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $3}'|bc
;;
# S1总大小
S1C)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $4}'|bc
;;
# S0使用大小
S0U)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $5}'|bc
;;
# S1使用大小
S1U)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $6}'|bc
;;
# Eden总大小
EC)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $7}'|bc
;;
# Eden使用大小
EU)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $8}'|bc
;;
#old大小
OC)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $9}'|bc
;;
#old使用大小
OU)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $10}'|bc
;;
# S0使用率
S0Util)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $11}'|bc
;;
# S1使用率
S1Util)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $12}'|bc
;;
# Eden使用率
EUtil)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $13}'|bc
;;
#old使用率
OUtil)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $14}'|bc
;;
# 年轻代垃圾回收次数
YGC)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $15}'|bc
;;
# 年轻代垃圾回收消耗时间
YGCT)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $16}'|bc
;;
# 老年代垃圾回收次数
FGC)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $17}'|bc
;;
# 老年代垃圾回收消耗时间
FGCT)
grep -w $1 /tmp/java_memory_status.txt |awk '{print $18}'|bc
;;
esac
Add configuration files and customize monitoring items
UserParameter=javaps,/etc/zabbix/parameter_script/java_discovery.sh
UserParameter=javastat[*],/etc/zabbix/parameter_script/getjavastatus.sh $1 $2
Restart the zabbix-agent2 process
service zabbix-agent2 restart
Configure scheduled tasks
*/1 * * * * sh /data/script/monitor/getJavaMemoryStatus.sh
Configure java process automatic discovery
Create a template group: JavaProcess
Create a template JavaProcess
Create auto-discovery rules in the JavaProcess template
Add the prototype of the monitoring item to be monitored
Add a JavaProcess template to the host to be monitored
Zabbix will automatically add the discovered process to the corresponding host, and then create the corresponding monitoring item according to the prototype of the monitoring item. After collecting the data, grafana will generate a graph