OceanBase data file reduction practice

This article introduces the data file reduction scenario of the OceanBase cluster and provides a reduction solution for reference.

Author: Guan Bingwen, a member of Aikesheng DBA team, responsible for database-related technical support, one step at a time, two steps at a time, combining diligence and laziness.

Produced by the Aikeson open source community, original content may not be used without authorization. Please contact the editor and indicate the source for reprinting.

This article is about 1,200 words and is expected to take 4 minutes to read.

Shrinking scenario

Previously, on one of the nodes of a bank's OceanBase cluster with a 1-1-1 architecture, when the OBServer program crashed, a core file was generated by default on the data disk /data/1. Generally, the size of the core file is the memory size occupied by the program when it is running, which is about 400GB. However, the data disk has already pre-allocated 90% of the space for the data file (block_file), and the remaining available space is not enough to store such a large file, causing the /data/1directory to become full, which causes two problems:

  1. The core file is not completely written, and the incomplete core file makes it difficult to analyze the cause of the failure.
  2. The data disk is full, which directly results in the node being unable to provide external services.

After restoring the OBServer service, after discussions with the project team, it was decided to shrink the data files for the cluster to 80% of the total size of the data disk to avoid the same situation when the fault recurs in the future.

The pictures in this article and the server information (IP address, cluster name, tenant name) displayed in the code are for the simulation environment built by individuals and are only used to assist in explaining the specific steps.

shrink operation

Version Information

  • OBServer version: 3.2.3
  • OCP version: 3.3.3

Related parameters

datafile_size

Used to set the size of data files. If you want to shrink datafile_size, you can delete this node from the cluster and rebuild this node. The current value of the cluster is 0.

datafile_disk_percentage

Indicates the percentage of the total disk space occupied by data_dir. The current value of the cluster is 90.

1 Adjust parameters

Cluster->Parameter Management , adjust datafile_disk_percentagethe value to 80, that is, block_filethe disk occupancy ratio is 80%.

2 Reduce tenant replicas

Cluster->Tenant Management , select the tenant (including systenant), select zone in the replica details, delete the replica (for example: zone3), and wait for the task to end.

3 Offline OBServer

Cluster->Overview , deleting the OBServer of zone3 from the OBServer list is equivalent to uninstalling the OBServer service on this node and waiting for the task to end.

4 Online OBServer

At this time, the OceanBase installation package of the node has been uninstalled, and the related directory space has also been cleared. If you want to use this OBServer to go online again, you need to install the RPM package of OceanBase and initialize related directories and other operations.

Since OCP currently (version 3.3.3) cannot specify additional parameters when starting the OBServer process, a black screen command line operation is used for this step.

4.1 Install RPM package

Use root user.

rpm -ivh oceanbase-3.2.3.3-107050022023040817.el7.x86_64.rpm

4.2 Initialization directory

Use the admin user.

export cluster_name=sit 

mkdir -p /data/1/$cluster_name/{etc3,sort_dir,sstable} 
mkdir -p /data/log1/$cluster_name/{clog,etc2,ilog,slog,oob_clog} 
mkdir -p /home/admin/oceanbase/store/$cluster_name 

chown -R admin:admin /data/1/$cluster_name && chown -R admin:admin /home/admin/oceanbase && chown -R admin:admin /data/log1/$cluster_name 

for t in {etc3,sort_dir,sstable};do ln -sf /data/1/$cluster_name/$t /home/admin/oceanbase/store/$cluster_name/$t; done 
for t in {clog,etc2,ilog,slog,oob_clog}; do ln -sf /data/log1/$cluster_name/$t /home/admin/oceanbase/store/$cluster_name/$t; done

4.3 Specify parameters to start the OBServer process

Use the admin user.

cd /home/admin/oceanbase 
ulimit -s 10240  ##堆栈的最大值
ulimit -c unlimited   ##当某些程序发生错误时,系统可能会将该程序在内存中的信息写成文件(除错用),这种文件就被称为核心文件(core file)

Start the OBServer process.

cd /home/admin/oceanbase

/bin/observer -i eth0 -p 2881 -P 2882 -n sit -z zone3 -d /home/admin/oceanbase/store/sit -r '10.186.65.8:2882:2881;10.186.65.123:2882:2881;10.186.65.56:2882:2881' -l info -o 'obconfig_url=http://10.186.65.11:8080/services?Action=ObRootServiceInfo&User_ID=alibaba&UID=ocpmaster&ObRegion=sit,config_additional_dir=/data/1/sit/etc3;/data/log1/sit/etc2,cluster_id=16777777,datafile_disk_percentage=80,cpu_count=16,system_memory=5G'

Parameter reference value:

  • -iSpecify the network card name, which can ifconfigbe viewed through the command.
  • -pSpecify the service port number, usually 2881.
  • -PSpecify the RPC port number, usually 2882.
  • -nSpecify the cluster name and keep it consistent with the original one.
  • -zSpecify the Zone to which the started OBServer process belongs, and keep it consistent with the original one.
  • -dSpecify the cluster home directory and leave it unchanged except for the cluster name.
  • -rTo specify the RS list, you can view the rootservice_list parameter of the current cluster.
  • -lSpecify the log level, the default is INFO, that is, only log data of INFO level and above will be printed to the observer.log, election.log and rootservice.log log files.
  • -oSpecify cluster startup parameters, which need to be set according to the actual situation.
    • obconfig_url: Used to set the URL address of the OBConfig service, which should be consistent with the original one.
    • config_additional_dir: Used to set multiple directories for local storage of configuration files to store multiple configuration files for redundancy.
    • cluster_id: Specify the cluster ID, which should be consistent with the original one.
    • datafile_disk_percentage: Set to the disk percentage occupied by the data after shrinkage.
    • cpu_count: Specify the number of CPUs, which is consistent with the original.
    • system_memory: Specifies the internal reserved memory of OceanBase, which is consistent with the original.

4.4 Log in to the cluster sys tenant and add OBServer

alter system add server '10.186.65.56:2882' zone 'zone3';

The OCP cluster overview page refreshes the OBServer list.

4.5 Other copy operations

Repeat the above steps to reduce the tenant copies one by one, offline/online other OBServers, and complete the tenant copies. At this point, OceanBase /data/1has completed the block_file reduction of the data disk.

4.6 Restart the cluster

Finally, restart the cluster and verify that the cluster is running properly.

Summarize

This data file reduction operation is equivalent to reinstalling the OBServer service on each node of the cluster. It has certain risks in the production environment. It is recommended to do a good job of backup. Therefore, in the same fault scenario as this article, priority is given to whether there is other local disk space (there are network restrictions on NFS mount disk transmission, which will not be considered for the time being) that can be used to store the core file and modify its generation path.

In addition, datafile_disk_percentagewhen datafile_sizethe parameters need to be increased, they can be dynamically adjusted in the cluster without restarting the cluster. Adjusting the parameters to a smaller value will have no effect.

For more technical articles, please visit: https://opensource.actionsky.com/

About SQLE

SQLE from the Axon open source community is a SQL audit tool for database users and managers that supports multi-scenario audits, standardized online processes, native support for MySQL audits and scalable database types.

SQLE get

type address
Repository https://github.com/actiontech/sqle
document https://actiontech.github.io/sqle-docs/
release news https://github.com/actiontech/sqle/releases
Data audit plug-in development documentation https://actiontech.github.io/sqle-docs/docs/dev-manual/plugins/howtouse
JetBrains releases Rust IDE: RustRover Java 21 / JDK 21 (LTS) GA With so many Java developers in China, an ecological-level application development framework .NET 8 should be born. The performance is greatly improved, and it is far ahead of .NET 7. PostgreSQL 16 is released by a former member of the Rust team I deeply regret and asked to cancel my name. I completed the removal of Nue JS on the front end yesterday. The author said that I will create a new Web ecosystem. NetEase Fuxi responded to the death of an employee who was "threatened by HR due to BUG". Ren Zhengfei: We are about to enter the fourth industrial revolution, Apple Is Huawei's teacher Vercel's new product "v0": Generate UI interface code based on text
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/actiontechoss/blog/10112046