Count the daily number and size of elasticsearch indices through shell scripts

  • Recap:

  • Recently, there are always problems in the elasticsearch cluster. Although it was fixed before, new problems have appeared. Therefore, PM asks to pull how many indexes are created by elasticsearch every day, and how big the indexes are, the machine needs to be evaluated.

  • The customer has no access to the elasticsearch cluster on site (production environment, the customer does not allow it), and opens a pod through the dashboard (resources are limited and cannot be operated smoothly), so they have to find another way to obtain index information from the dashboard operation, and then transfer the file to Native

  • Next, I’m going to install 13

# curl -XGET "localhostIp:9200/_cat/indices?v" | grep xxx > /tmp/xxx.indices
'这步是为了将elasticsearch的indices重定向到一个文件里面,因为elasticsearch收集的日志中,有一部分是自己公司产品的日志,还有一部分是客户方的日志,因为保密协议,所以,一些公司的信息就以xxx来替代了
Please see the script
#!/usr/bin/env bash
set -e 

pwd=$(cd `dirname $0`; pwd)
year=2020
month=12
day=$(seq -w 1 31)
file=test
dir=${pwd}/total

mkdir ${dir}

for i in ${day}
do
  kb=$(grep ${
     
     year}-${
     
     month}-${
     
     i} ${
     
     file}.indices.txt | \
  awk '{print $NF}' | \
  grep kb | \
  awk -F 'kb' '{print $1}' | \
  awk '{sum += $1};END {print sum/1024/1024}' )
  echo "${year}-${month}-${i}合计:${kb}gb" > ${dir}/${file}.indices.${year}-${month}-${i}.txt
  
  mb=$(grep ${
     
     year}-${
     
     month}-${
     
     i} ${
     
     file}.indices.txt | \
  awk '{print $NF}' | \
  grep mb | \
  awk -F 'mb' '{print $1}' | \
  awk '{sum += $1};END {print sum/1024}')
  echo "${year}-${month}-${i}合计:${mb}gb" >> ${dir}/${file}.indices.${year}-${month}-${i}.txt
  
  gb=$(grep ${
     
     year}-${
     
     month}-${
     
     i} ${
     
     file}.indices.txt | \
  awk '{print $NF}' | \
  grep gb | \
  awk -F 'gb' '{print $1}' | \
  awk '{sum += $1};END {print sum}')
  echo "${year}-${month}-${i}合计:${gb}gb" >> ${dir}/${file}.indices.${year}-${month}-${i}.txt
  
  total=$(cat ${
     
     dir}/${
     
     file}.indices.${
     
     year}-${
     
     month}-${
     
     i}.txt | \
  awk -F ':' '{print $NF}' | awk -F 'gb' '{sum += $1};END {print sum}')
  echo "${year}-${month}-${i}总计:${total}gb" >> ${dir}/${file}.indices.${year}-${month}-${i}.txt
  
  wc=$(grep ${
     
     year}-${
     
     month}-${
     
     i} ${
     
     file}.indices.txt | \
  wc -l)
  echo "${year}-${month}-${i}总计:${wc}条" >> ${dir}/${file}.indices.${year}-${month}-${i}.txt
done

  grep 总计 ${dir}/${file}.indices.${year}-${month}-*.txt > ${dir}/${file}.indices.total.txt
  cat ${dir}/${file}.indices.total.txt
  sleep 10
  rm -rf ${dir}
  rm -f ${pwd}/${file}.indices.total.txt
README:
1、因为一个月31天嘛,所以脚本会生成31份文件,为了不影响一些平时的操作,脚本执行完,会将获取的信息输出到终端,随后会将这些收集信息的文件都删除,请知悉
2、关于elasticsearch导出的信息模板,请看下面,如果模板不一样,需要修改awk的位置变量(建议重写。。。)
3、统计的数据大小单位,我默认的是GB,awk我不太会用,所以最终求和后,会有科学计算的符号在里面,'如果有大佬看过,知道如何优化,还望赐教
4、关于变量:
   4.1、year是指年,month是指月,day用的seq命令(利用-w参数,使得数字的输出都是两位数,因为索引的模板中,日期都是两位数的)
   4.2、file是指最先导出的indices文件中,过滤出来的字段,我的文件名称格式是test.indices.txt,使用的时候,需要注意,否则会报错
   4.3、dir是脚本运行的时候,创建的目录,是在脚本所在目录下创建的下一级目录,将后面统计的文件都存放在这个目录下面,目的是为了方便后面直接删除目录,避免错删文件
   4.4、kb是指store.size一列的kb大小的indices过滤出来,通过awk进行求和(mb和gb同理)
   4.5、total是将kb和mb换算成gb后的数字和gb的数字求和,获取一天的索引总大小
   4.6、wc是索引数量求和,获取一天的索引数量
5、脚本逻辑:
   5.1、通过for循环,awk切割,将store.size这一列过滤出来,然后细分kb,mb和gb,切割出数字进行换算和求和,最终默认的单位是gb
   5.2、通过for循环,wc统计,获取每天的索引数量
   5.3、通过'grep 总计'将每天的indices信息重定向到xxx.indices.total.txt,通过cat输出到终端,然后睡眠10秒后,删除脚本生成的文件
6、'脚本只在我自己的环境上测试过,也是完成了PM交代的任务,以上的内容,仅供学习和参考,切勿用于商业用途(开源万岁)
elasticsearch 模板(公司相关的信息和谐了,这些不重要)
health status index                                         uuid      pri rep docs.count docs.deleted store.size pri.store.size
green  open   xxx-xxx-xxx-ip:port-2020-11-27 8psXiCG0Acubr46OcKo9TA   5   1        525            0    841.1kb        420.5kb
# 输出到终端的效果(同样,公司的信息做了和谐):
/tmp/total/xxxxxx.indices.2020-12-01.txt:2020-12-01总计:27.5024gb
/tmp/total/xxxxxx.indices.2020-12-01.txt:2020-12-01总计:3条
/tmp/total/xxxxxx.indices.2020-12-02.txt:2020-12-02总计:57.0024gb
/tmp/total/xxxxxx.indices.2020-12-02.txt:2020-12-02总计:4条
/tmp/total/xxxxxx.indices.2020-12-03.txt:2020-12-03总计:59.6024gb
/tmp/total/xxxxxx.indices.2020-12-03.txt:2020-12-03总计:4条
/tmp/total/xxxxxx.indices.2020-12-04.txt:2020-12-04总计:61.5026gb
/tmp/total/xxxxxx.indices.2020-12-04.txt:2020-12-04总计:4条
/tmp/total/xxxxxx.indices.2020-12-05.txt:2020-12-05总计:0.48008gb
/tmp/total/xxxxxx.indices.2020-12-05.txt:2020-12-05总计:2条
'本菜鸡有一个远大的志向:用最low的脚本,跑最贵的服务器     /二哈/二哈/二哈'

Guess you like

Origin blog.csdn.net/u010383467/article/details/111599185