Technology sharing | How to gracefully delete Zabbix's history related history tables

Author: Xu Wenliang

A Kesheng DBA member, a database engineer who is obsessed with technology, is mainly responsible for the daily operation and maintenance of the database. Good at MySQL, redis, and other common databases. I like fishing, reading books, watching scenery, and making new friends.

Source of this article: original contribution

*Produced by the Aikesheng open source community, the original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


Problem background:

Some time ago, the customer reported that the history_str table of the Zabbix instance had a large amount of data, resulting in high disk space usage. He wanted to clean up the table and asked if he had any good suggestions. I thought that I just learned the relevant knowledge points recently, and I can just test the learning results. After the test of practice, I finally passed the exam and the customer was quite satisfied, so I came up with this article.

Problem communication:

Through actually viewing the environment and communicating with customers, the following information is obtained:

1. The site is a two-way master-slave replication architecture, and the slave library read_only is not set.

2. The ibd data file of the history_str table exceeds 460G.

3. The stock data in the history_str table can be cleaned up directly.

4. The server where the on-site instance is located is a virtual machine with low configuration.

Therefore, after comprehensive consideration, it is recommended that customers create a new table with the same table structure and then perform a drop operation on the original table. However, the data volume of the table is relatively large, and the following risks need to be considered:

1. Dropping a large table may cause the instance to hang and affect the normal use of the database.

2. The operation of dropping a large table causes a master-slave delay.

3. Deleting large files causes high disk io pressure.

The final proposal:

On the basis of the above considerations, the following scheme is finally given:

1. Execute the following command in the main library to create the same table structure table and perform the rename operation:

create table history_str_new like history_str;
rename table history_str to history_str_old, history_str_new to
history_str;

2. Perform the following operations on the master library and slave library to create a hard link file:

ln history_str_old.ibd history_str_old.ibd.hdlk

3. After completing the second step, it is recommended to perform the operation at intervals of one or two days, let the history_str_old table data cool down from the innodb buffer pool, and then perform the following operations on the master and slave databases respectively during the off-peak period of business. It is recommended to operate the slave database first, and the slave database Verify that there is no problem before operating in the main library:

set sql log bin=0;       //临时关闭写操作记录binlog
drop table history_str_old;//执行drop操作
set sql log bin=l;       //恢复写操作记录binlog

4. Delete the history_old.ibd.hdlk file to free up space, which can be realized through the truncate command of Linux. The reference script is as follows:

#!/bin/bash
##############################################################################
##            第一个参数为需要执行操作的文件的文件名称           ##
##           第二个参数为每次执行操作的缩减值,单位为MB          ##
##           第三个参数为每次执行后的睡眠时间,单位为S           ##
##############################################################################
  
fileSize=`du $1|awk -F" " '{print $1}'`
fileName=$1
chunk=$2
sleepTime=$3
chunkSize=$(( chunk * 1024 ))
rotateTime=$(( fileSize / chunkSize ))
declare -a currentSize
echo $rotateTime
  
function truncate_action()
{
for (( i=0; i<=${rotateTime}; i++ ))
do
if [ $i -eq 0 ];then
echo "开始进行truncate操作,操作文件名为:"$fileName
fi
  
if [ $i -eq ${rotateTime} ];then
echo "执行truncate操作结束!!!"
fi
  
truncate -s -${chunk}M $fileName
currentSize=`du -sh $fileName|awk -F" " '{print $1}'`
echo "当前文件大小为: "$currentSize
sleep $sleepTime
done
}
  
truncate_action

Example: sh truncateFile.sh history_str_old.ibd.hdlk 256 1, means to delete the history_str_old.ibd.hdlk file, each time the truncation size is 256M, and then the sleep interval is 1s.

5. At this point, just wait quietly. If you are bored, you can also think about life.

tips:

The problem of how to operate has been solved before, but as a competent DBA, you must not only know how to do it, but also know why you do it. Otherwise, it is easy to press the Enter key, but it is difficult to regret it. Here comes the dry stuff, let’s find out together . Don't panic next time you encounter a similar problem.

tips1:

MySQL删除表的流程:
1.持有buffer pool mutex。
2.持有buffer pool中的flush list mutex。
3.扫描flush list列表,如果脏页属于drop掉的table,则直接将其从flush list列表中移除。如果开启了AHI,还会遍历LRU,删除innodb表的自适应散列索引项,如果mysql版本在5.5.23之前,则直接删除,对于5.5.23及以后版本,如果占用cpu和mutex时间过长,则释放cpu资源,flush list mutex和buffer pool mutex一段时间,并进行context switch。一段时间后重新持有buffer pool mutex,flush list mutex。
4.释放flush list mutex。
5.释放buffer pool mutex。

tips2:

对于linux系统,一个磁盘上的文件可以由多个文件系统的文件引用,且这多个文件完全相同,并指向同一个磁盘上的文件,当删除其中任一一个文件时,并不会删除真实的文件,而是将其被引用的数目减1,只有当被引用数目为0时,才会真正删除文件。

tips3:

大表drop或者truncate相关的一些bug:
 
这两个指出drop table 会做两次 LRU 扫描:一次是从 LRU list 中删除表的数据页,一次是删除表的 AHI 条目。
https://bugs.mysql.com/bug.php?id=51325
https://bugs.mysql.com/bug.php?id=64284
  
对于分区表,删除多个分区时,删除每个分区都会扫描LRU两次。
https://bugs.mysql.com/bug.php?id=61188
  
truncate table 会扫描 LRU 来删除 AHI,导致性能下降;8.0 已修复,方法是将 truncate 映射成 drop table + create table
https://bugs.mysql.com/bug.php?id=68184
  
drop table 扫描 LRU 删除 AHI 导致信号量等待,造成长时间的阻塞
https://bugs.mysql.com/bug.php?id=91977
  
8.0依旧修复了 truncate table 的问题,但是对于一些查询产生的磁盘临时表(innodb 表),在临时表被删除时,还是会有同样的问题。这个bug在8.0.23中得到修复。
https://bugs.mysql.com/bug.php?id=98869

Guess you like

Origin blog.csdn.net/ActionTech/article/details/130222982