This article introduces the data file reduction scenario of the OceanBase cluster and provides a reduction solution for reference.
Author: Guan Bingwen, a member of Aikesheng DBA team, responsible for database-related technical support, one step at a time, two steps at a time, combining diligence and laziness.
Produced by the Aikeson open source community, original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.
This article is about 1,200 words and is expected to take 4 minutes to read.
Shrinking scenario
Previously, on one of the nodes of a bank's OceanBase cluster with a 1-1-1 architecture, when the OBServer program crashed, a core file was generated by default on the data disk /data/1
. Generally, the size of the core file is the memory size occupied by the program when it is running, which is about 400GB. However, the data disk has already pre-allocated 90% of the space for the data file (block_file), and the remaining available space is not enough to store such a large file, causing the /data/1
directory to become full, which causes two problems:
- The core file is not completely written, and the incomplete core file makes it difficult to analyze the cause of the failure.
- The data disk is full, which directly results in the node being unable to provide external services.
After restoring the OBServer service, after discussions with the project team, it was decided to shrink the data files for the cluster to 80% of the total size of the data disk to avoid the same situation when the fault recurs in the future.
The pictures in this article and the server information (IP address, cluster name, tenant name) displayed in the code are for the simulation environment built by individuals and are only used to assist in explaining the specific steps.
shrink operation
Version Information
- OBServer version: 3.2.3
- OCP version: 3.3.3
Related parameters
datafile_size
Used to set the size of data files. If you want to shrink datafile_size
, you can delete this node from the cluster and rebuild this node. The current value of the cluster is 0.
datafile_disk_percentage
Indicates the percentage of the total disk space occupied by data_dir. The current value of the cluster is 90.
1 Adjust parameters
Cluster->Parameter Management , adjust datafile_disk_percentage
the value to 80, that is, block_file
the disk occupancy ratio is 80%.
2 Reduce tenant replicas
Cluster->Tenant Management , select the tenant (including sys
tenant), select zone in the replica details, delete the replica (for example: zone3), and wait for the task to end.
3 Offline OBServer
Cluster->Overview , deleting the OBServer of zone3 from the OBServer list is equivalent to uninstalling the OBServer service on this node and waiting for the task to end.
4 Online OBServer
At this time, the OceanBase installation package of the node has been uninstalled, and the related directory space has also been cleared. If you want to use this OBServer to go online again, you need to install the RPM package of OceanBase and initialize related directories and other operations.
Since OCP currently (version 3.3.3) cannot specify additional parameters when starting the OBServer process, a black screen command line operation is used for this step.
4.1 Install RPM package
Use root user.
rpm -ivh oceanbase-3.2.3.3-107050022023040817.el7.x86_64.rpm
4.2 Initialization directory
Use the admin user.
export cluster_name=sit
mkdir -p /data/1/$cluster_name/{etc3,sort_dir,sstable}
mkdir -p /data/log1/$cluster_name/{clog,etc2,ilog,slog,oob_clog}
mkdir -p /home/admin/oceanbase/store/$cluster_name
chown -R admin:admin /data/1/$cluster_name && chown -R admin:admin /home/admin/oceanbase && chown -R admin:admin /data/log1/$cluster_name
for t in {etc3,sort_dir,sstable};do ln -sf /data/1/$cluster_name/$t /home/admin/oceanbase/store/$cluster_name/$t; done
for t in {clog,etc2,ilog,slog,oob_clog}; do ln -sf /data/log1/$cluster_name/$t /home/admin/oceanbase/store/$cluster_name/$t; done
4.3 Specify parameters to start the OBServer process
Use the admin user.
cd /home/admin/oceanbase
ulimit -s 10240 ##堆栈的最大值
ulimit -c unlimited ##当某些程序发生错误时,系统可能会将该程序在内存中的信息写成文件(除错用),这种文件就被称为核心文件(core file)
Start the OBServer process.
cd /home/admin/oceanbase
/bin/observer -i eth0 -p 2881 -P 2882 -n sit -z zone3 -d /home/admin/oceanbase/store/sit -r '10.186.65.8:2882:2881;10.186.65.123:2882:2881;10.186.65.56:2882:2881' -l info -o 'obconfig_url=http://10.186.65.11:8080/services?Action=ObRootServiceInfo&User_ID=alibaba&UID=ocpmaster&ObRegion=sit,config_additional_dir=/data/1/sit/etc3;/data/log1/sit/etc2,cluster_id=16777777,datafile_disk_percentage=80,cpu_count=16,system_memory=5G'
Parameter reference value:
-i
Specify the network card name, which canifconfig
be viewed through the command.-p
Specify the service port number, usually 2881.-P
Specify the RPC port number, usually 2882.-n
Specify the cluster name and keep it consistent with the original one.-z
Specify the Zone to which the started OBServer process belongs, and keep it consistent with the original one.-d
Specify the cluster home directory and leave it unchanged except for the cluster name.-r
To specify the RS list, you can view the rootservice_list parameter of the current cluster.-l
Specify the log level, the default is INFO, that is, only log data of INFO level and above will be printed to the observer.log, election.log and rootservice.log log files.-o
Specify cluster startup parameters, which need to be set according to the actual situation.obconfig_url
: Used to set the URL address of the OBConfig service, which should be consistent with the original one.config_additional_dir
: Used to set multiple directories for local storage of configuration files to store multiple configuration files for redundancy.cluster_id
: Specify the cluster ID, which should be consistent with the original one.datafile_disk_percentage
: Set to the disk percentage occupied by the data after shrinkage.cpu_count
: Specify the number of CPUs, which is consistent with the original.system_memory
: Specifies the internal reserved memory of OceanBase, which is consistent with the original.
4.4 Log in to the cluster sys tenant and add OBServer
alter system add server '10.186.65.56:2882' zone 'zone3';
The OCP cluster overview page refreshes the OBServer list.
4.5 Other copy operations
Repeat the above steps to reduce the tenant copies one by one, offline/online other OBServers, and complete the tenant copies. At this point, OceanBase /data/1
has completed the block_file reduction of the data disk.
4.6 Restart the cluster
Finally, restart the cluster and verify that the cluster is running properly.
Summarize
This data file reduction operation is equivalent to reinstalling the OBServer service on each node of the cluster. It has certain risks in the production environment. It is recommended to do a good job of backup. Therefore, in the same fault scenario as this article, priority is given to whether there is other local disk space (there are network restrictions on NFS mount disk transmission, which will not be considered for the time being) that can be used to store the core file and modify its generation path.
In addition, datafile_disk_percentage
when datafile_size
the parameters need to be increased, they can be dynamically adjusted in the cluster without restarting the cluster. Adjusting the parameters to a smaller value will have no effect.
For more technical articles, please visit: https://opensource.actionsky.com/
About SQLE
SQLE from the Axon open source community is a SQL audit tool for database users and managers that supports multi-scenario audits, standardized online processes, native support for MySQL audits and scalable database types.
SQLE get
type | address |
---|---|
Repository | https://github.com/actiontech/sqle |
document | https://actiontech.github.io/sqle-docs/ |
release news | https://github.com/actiontech/sqle/releases |
Data audit plug-in development documentation | https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse |