Summary of gaussdb daily operation and maintenance commands [01]

1. Confirm the database connection (user traffic)

select * from pg_stat_activity;

select usename,client_addr,query from pg_stat_activity;

select distinct * from (select usename 用户名,client_addr 监听地址,count(1) over(partition by usename,client_addr) 连接数,count(1) over() 连接总数 from pg_stat_activity)a;

select query from pg_stat_activity where current_timestamp -query_start > interval '1 days';

select usename,client_addr,query,query_start,state_change from pg_stat_activity;

select usename,client_addr,query,query_start,state_change from pg_stat_activity where CLIENT_ADDR = '具体IP';

2. Confirm whether the cluster and database are in normal operation, and can provide data services to the outside world

01. Instance status

  • The query returns the cluster status, the normal status should be Normal, and Balanced should be Yes
gs_om -t status
  • View the instance process status on each node of the cluster in detail
gs_om -t status --detail

02. Session Information

  • Query the current session of the database
select * from pg_stat_activity;
  • The current number of CN sessions and the maximum number of connection sessions allowed
show max_active_statements;
  • The current number of CN database connections, grouped by user and thread ID
select usename,pid,application_name,client_addr from pg_stat_activity order by usename,pid;
  • Forcibly stop the session
select pg_terminate_backent(pid);

03. Parameter check

show max_active_statements;

04. Modify parameters

gs_guc reload -Z coordinator -N all -I all -c "max_active_statements=10"
gs_guc reload -Z datanode -N all -I all -c "max_active_statements=10"

05. Instance abnormality
When the connection to the database appears slow, hangs, etc., diagnosis and analysis are required, and the database instance may even need to be restarted

06. Information collection
By collecting system Hang information, system status information, etc., the reasons for system hang can be analyzed. Interval sampling can be used to compare changes and assist analysis.

gs_collector --begin-time="20180131 23:00" --end-time="20180201 20:00" -h host1

07. Clean up the running log
Note: Please clean up carefully to ensure that all cleaned up logs do not affect future problem location.

cd $GAUSSLOG
rm 日志名称

08. Shutdown: Safe shutdown
If the database is abnormal and needs to be restarted, you can use the demonstration command to wait for the user's business to end and then exit to ensure that the instance data of the primary and standby relationship are consistent

gs_om -t stop -m smart

09. Shutdown: Forcibly shut
down the database. If the database cannot be shut down smoothly in a safe manner, you can exit the cluster without waiting for the end of the business. The instance data of the active/standby relationship may be inconsistent.

gs_om -t stop -m immediate

3. Check the log information

01. View system log and operation log location

echo $GAUSSLOG

02. Run log analysis: database instance
Enter the command "\set VERBOSITY verbose" on the gsql client to enter the verbose mode. The verbose mode will display detailed error messages. In the error code, you can query the corresponding handling method of the related error message.

03. Operation log analysis: cluster type
analysis of cluster operation by viewing cm logs

cd /GAUSSLOG/cm

4. Black box log

01. Set the maximum number of core files generated. When the number of
core files exceeds the set number, the newly generated core files will overwrite the old core files. Avoid frequent exceptions and repeated restarts of Gaussdb from generating core files quickly occupying disk space.

gs_guc set -Z datanode -N all -I all -c "bbox_dump_count=4"
gs_guc set -Z coordinator -N all -I all -c "bbox_dump_count=4"

02. Set the generation path of the core file. If
not set, Gaussdb will read the path from /proc/sys/kernel/core_pattern.
If the path is illegal (does not exist, is not a directory or the user does not have write permission), the core file will be generated in the data directory of the database

mkdir /corefiles
chmod 750 /_pathcorefiles
gs_guc set -Z datanode -N all -I all -c "bbox_dump_path='/corefiles'"
gs_guc set -Z coordinator -N all -I all -c "bbox_dump_path='/corefiles'"

03. Enabling the black box log (core file) recording function
Enabling the function of collecting core files has the following impact on the operating system: it has a certain impact on performance, especially when the process is frequently abnormal. The core file will take up disk space.

5. Check spatial information

01. View all table spaces

\db
select oid,spcname from pg_tablespace;
select pg_tablespace_location(tablespace_oid);

02. View the space occupied by the table

select t_name,pg_size_pretty(pg_relation_size(t_name)) from (values('store_sales'),('date_dim'),('store')) as names1(t_name);

6. Lock information check

Locking is the core method of database concurrency control. Checking relevant information can monitor the transaction and operating status of the database.

01. Lock information
Query lock information in the database

select * from pg_locks;

Query status information of threads waiting for locks

select * from pg_thread_wait_status where wait_status = 'acquire lock';

02. Lock fault troubleshooting When the
database has lock contention and blocking, it is necessary to check and deal with the lock, if necessary, eliminate the lock by killing the blocking process

03. Query blocking session The
query returns the session ID, user information, query status, and table and mode information that caused the blocking

select w.query as waiting_query,w.pid as w_pid, w.usename as w_user,
l.query as locking_query,
l.pid as l_pid,
l.usename as l_user,
t.schemaname||'.'||t.relname as tablename 
from pg_stat_activity w join pg_locks l1 on w.pid = l1.pid 
and not l1.granted join pg_locks l2 on l1.relation =l2.relation 
and l2.granted join pg_stat_activity l on l2.pid = l.pid join pg_stat_user_tables t on l1.relation = t.relid where w.waiting limit 1;

04. Kill blocking session
End the session according to the session ID

select pg_terminate_backend(139834762084352);

05. Kill all idle processes

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE STATE='idle'; 

7. Running statistics

View the current database session, transaction and statement operation

01. Query transaction time
Query thread start time, transaction start time, SQL start time and state change time

select backend_start,xact_start,query_start,state_change from pg_stat_activity;

02. Query long-running query statements in the system
After querying, the statement list will be returned in order of execution time from longest to shortest. The first is the longest query statement in the current system.
The returned result contains the SQL statement called by the system and the SQL statement executed by the user. Please find the statement that the user executes for a long time according to the actual situation.
If the current system is busy, you can view the filtering by checking current_timestamp-query_start greater than a certain threshold.

select query from pg_stat_activity where current_timestamp -query_start > interval '1 days';

03. Session Statistics
View the session count information of the current server

select count(*) from pg_stat_activity;

Query the session information that currently uses the most memory

select * from pv_session_memory_detail() order by usedsize desc limit 10;

8. Inspection object

Tables, indexes, partitions, constraints, etc. are the core storage objects of the database. The maintenance of the core information and objects is an important daily work of the DBA.

01. Lock information
Query lock information in the database

select * from pg_locks;

Query status information of threads waiting for locks

select * from pg_thread_wait_status where wait_status = 'acquire lock';

02. Table structure query

\d+ table_name

03. Table statistical information
Query the last statistical information analysis collection time. Statistical information affects the execution plan. When SQL execution is abnormal, the statistical information needs to be analyzed.

select relid.schemaname,relname,last_analyze,analyze_count from pg_stat_all_tables where last_analyze is not null;

04. Statistical information collection. Collecting statistical information is a complex task and requires detailed design to
update the statistical information of a single table

analyze tablename;

Update statistics of the whole database

analyze;

05. Index information data

\d+ index_name

Query the number of leaf blocks and the clustering factor. A clustering factor that is too high and close to the number of rows may indicate that the indexing efficiency is not high.

select * from pg_index where indrelid =(select oid from pg_class where relname ='table_name');

06. Index definition query
Query index creation statement

select * from pg_indexes where tablename ='table_name';

07. Partition object query
Partition type, quantity, boundary value, etc.

select * from dba_part_tables;
selelct * from dba_tab_partitions;

08. Constraint information

select * from pg_constraint;

09. sql inspection and tuning

Continuously pay attention to and analyze sql, and optimize the efficiency and performance of sql with the aid of sql self-diagnosis.

01. SQL self-diagnosis
The warning information of the job currently being executed on the CN.

select warning from gs_wim_session_statistics;

The warning information of the job that has been executed in history on the current CN.

select warning from gs_wim_session_history;

View older historical job warning information

select warning from gs_wim_session_info;
  • When the GUC parameter enable_resource_record is on,
    the records in the GS_WLM_SESSION_HISTORY view are dumped to the system table GS_WLM_SESSION_INFO every 3 minutes, and
    the records in the GS_WLM_SESSION_HISTORY view are deleted;
  • When the GUC parameter enable_resource_record is off, the retention time recorded in the GS_WLM_SESSION_HISTORY view will be deleted after the expiration time (the expiration time is 3 minutes).

02. View SQL execution plan
Only generate execution plan, not actual execution

explain select...

Generate an execution plan, execute it, and display summary information about the execution. The actual running time statistics are added to the display, including the total time (milliseconds) spent in each planning node and the number of rows it actually returns.

explain analyze select...

Generate an execution plan, execute it, and display all information during execution

explain performance select...

03. Statistical information collection It is
recommended to perform statistical information on scheduled tasks and update the statistical information of a single table

analyze tablename;

Update statistics of the whole database

analyze;

04. Add plan hint to assist in tuning

select /*+ <plan_hint1><plan_hint2> */  *  from t1,(select /*+ <plan_hint3> */ from t2) where 1=1;

10. Check scheduled tasks

Check the execution of timed tasks in the database to ensure that the background tasks are executed correctly, with particular attention to core tasks such as statistical information collection.
Query the user's job information to ensure that the task is executed successfully at the expected time, which is one of the important tasks of dba.

View the current user's scheduled task information.

select job,dbname,log_user,start_date,last_date,this_date,next_date,broken,status,interval,failures,what from user_jobs;

Query the timing task information of all users.

select job_id,dbname,log_user,start_date,last_satrt_date,this_run_date,next_run_date,interval,failure_count from pg_job;

01. Create a scheduled task
Create a test table

create table test(id int,time date);

Create a user-defined stored procedure that inserts data into the test table

create or replace procdeure prc_job_1()
as
N_NUM integer :=1;
begin
for i in 1..1000 loop
insert into test values(i,sysdate);
end loop;
end;
/

Create a scheduled task to execute a stored procedure, dbms_job.submit can be called by call or select. The SQL statement to be executed in the timed task can be one or more'DML','anonymous block','statement for calling a stored procedure' or 3 mixed scenarios.

call dbms_job.submit('call public.prc_job_1();',sysdate,'interval "1 minute"',:a);

02. Start a scheduled task

call dbms_job.broken(1,false);

03. Stop timing tasks

call dbms_job.broken(1,true);

04. Delete scheduled tasks

call dbms_job.remove(1);

11. Backup and recovery of the entire cluster

Backup
Open wal log archive

python GaussRoach.py -t config --archive=true -p

Perform backup, backup to NBU

python GaussRoach.py -t backup --master-port 6000 --media-destination samplepolicy --media-type NBU --metadata-destination /home/userA/metadata

Perform backup, backup to disk

python GaussRoach.py -t backup --master-port 6000 --media-destination /home/userA/media --media-type Disk --metadata-destination /home/userA/metadata

Turn off wal log archiving

python GaussRoach.py -t config --achive=falsh -p

Note: The purpose of
enabling wal log archiving is to ensure that user changes can also be fully backed up from the start to the end of the backup (or recovery). The port number used for backup only needs to be set to an unoccupied port

Restore
Use backup on NBU to restore

python GaussRoach.py -t restore --clean --master-port 6000 --media-destination samplepolicy --media-type NBU --backup-key 20160121_190923 --metadata-destination /home/userA/metadata

Use disk backup to restore

python GaussRoach.py -t restore --clean --master-port 6000 --media-destination /home/userA/media --media-type Disk --backup-key  20160121_190548 --metadata-destination

12. Single or multi-table backup and recovery

Check whether the ${BIGDATA_HOME}/mppdb/.mppdbgs_profile file of each node contains the following content. If not included, please increase.

export LD_LIBRARY_PATH="/usr/openv/lib":$LD_LIBRARY_PATH

Perform backup

  • Single table backup directly use --tablename tbl_backup;
  • Multi-table backup use --logical --table-list /home/roack/bklist.txt
  • The table name is stored in bklist.txt. The example is disk backup. To backup to NBU, please use –media-type NBU, and the port can be specified as an unoccupied port.
python GaussRoach.py -t backup --master-port 6000 --media-destination /home/userA/backup --media-type disk --agent-port 7000 --dbname postgres --tablename tbl_backup --metadata-destination /home/userA/metadata

or

python GaussRoach.py -t backup --master-port 6000 --media-destination samplepolicy --media-type disk --agent-port 7000 --logical --table-list /home/roack/bklist.txt --dbname postgres --metadata-destination /home/userA/metadata

restore

python GaussRoach.py -t restore --master-port 6000 --media-destination /home/userA/backup --media-type Disk --agent-port 7000 --dbname postgres --tablename tbl_backup --backup-key 20160126_164639 --metadtata-destination /home/userA/metadata-destination /home/userA/metadata

or

python GaussRoach.py -t restore --master-port 6000 --media-destination /home/userA/media --media-type Disk --agent-port 7000 --logical --table-list /home/roach/blikst.txt --backup-key 20180605_185302 --dbname postgres --metadata-destination /home/userA/metadata

13. User mode backup and recovery

Example: Identifies that all data and object definitions in the hr and public schemas under the database human_resource will be exported and the exported content will be saved in /home/omm/backup/MPPDB_backup.sql

gs_dump -U jack -W Bigdata@123 -f /home/omm/backup/MPPDB_backup.sql -p 25308 human_resource -n hr -n public -F d
gs_restore -U jack -W Bigdata@123 -f /home/omm/backup/MPPDB_backup.sql -p 25308 -d human_resource -n hr -n public -e -c

14. Database Backup and Recovery

gs_dump -U jack -W Bigdata@123 -f /home/omm/backup/MPPDB_backup.tar -p 25308 human_resource -F -t
gs_restore -U jack -W Bigdata@123 -f /home/ommdbadmin/backup/MPPDB_backup.tar -p 25308 -d human_resource -F t

15. Check basic information

Basic information includes version, capacity check, etc. Regularly checking database information and registering it is one of the important contents of the database life cycle.

Version check

select version();

Capacity check

select pg_table_size('table_name');
select pg_database_size('database_name');

16. Mount and Unmount Floating

Eth0 of the VM machine, bond0 of the physical machine

Example: delete floating ip

ip addr delete 10.164.36.123/24 dev eth0:1
ip addr delete 10.164.36.123/24 dev bond0:1

Example: bind floating ip

ifconfig eth0:1 10.61.151.123/24
ifconfig bond00:1 10.61.151.123/24

17. Network related

Client test

ping 10.11.12.123
curl -kv 10.11.12.123:5432
telnet 10.11.12.123 5432

Description: Whether the physical network is blocked

Server test

netstat -anlp | grep 5432

Description: Whether the server process is normal

sudo /usr/sbin/arping -I bond0 -c 5 -U -s 10.58.14.133 -b 10.58.34.254

Description: Server traffic test, virtual machine eth0 or physical machine bond0, -s specifies ip, -b specifies gateway

gsql -U 用户 -d 数据库 -W 密码 -h IP地址 -p 端口

Description: Test whether the cluster can log in across servers

Server OS firewall

systemctl stop firewalld.service
systemctl disable firewalld
systemctl status firewalld.service

Description: Permanently turn off the OS firewall

Guess you like

Origin blog.csdn.net/qq_42226855/article/details/109603399