[Dry goods] Greenplum implementation and operation and maintenance best practices you must know

In the recently concluded Pivotal Global Data Roadshow, our big data architect introduced a lot of Greenplum's valuable experience and best practices. In order to provide you with better reference information, we systematically organized this long article of dry goods to share with you.

Project experience sharing

In the process of implementing projects for many customers, we will summarize some good experience, and sometimes we also try to develop some general commands according to various needs. We have a lot of useful experience summaries. These experiences reflect the value of Pivotal professional services. This time I chose 2 commands to share, but these commands are not official behaviors. At present, they are not provided as official commands with the product. Obtained through after-sales channels.

Personalized backup and restore commands

Sometimes some of the user's needs will be more special. Our existing backup and restore commands cannot be fully satisfied. We have summarized the needs of some users in different projects and developed a backup and restore command in a targeted manner. The characteristics of our backup and restore command are: first of all, we use gz compression in line with gpcrondump, and the size of the prepared file will be smaller than that in the library; according to the table and partition, prepare a file on each instance, usually the table in the library When the quantity control is reasonable, the number of files will not be a problem; backup by partition; backup a folder every day by default; can be backed up by mode; the Error table of external tables can be excluded, these tables are often not The value of the backup; our backup has a detailed log, with success and failure information, and can continue to run at a breakpoint. When the backup arrives at a table, it stops for some reason, and the same command will be re-executed next time. , It will continue to run down the last time it has not finished, and it will no longer run successfully; you can also specify data filtering conditions to meet custom data backup; the entire backup is not affected by individual failures; the backup uses optimistic locking, The shared lock is only accessed on the table being backed up. If the lock fails, the backup of the table is considered to have failed this time.

We have also done incremental recognition to automatically identify whether a table has changed from the last backup to this backup, including the Heap table and AO table; you can specify the number of concurrency and encoding character set, the encoding is to solve some special The garbled problem, the command is composed of a single file, can be run directly, no other preparation is required.

When restoring, you can choose which data to restore, you can restore to different libraries, you can restore some data with conditions; you can specify the latest data in a certain time range on the basis of incremental backup.

Data transmission between Greenplum clusters

For data transfer between clusters, GP has a data transfer tool gptransfer between clusters, but at present this tool is difficult to meet the special needs of some users, such as view transfer, conditional transfer, and text format error reporting. Because GPs cannot be accessed across libraries, whether within or between clusters, although we may solve this problem in the future in terms of products, we still need to use our own methods to solve it. We made a command that is more than the gptransfer command. Easy-to-use command. This command is not implemented in the form of named pipes, because there are some known problems with named pipes. For example, its state is difficult to control precisely.

The operation of the command requires network communication between the two clusters. As long as the network can communicate, the command can be deployed and executed; the source and target object names can be different, and source data filtering with conditions can be done , The source can also be a view, the command will automatically identify whether the view can be used in fast mode or normal mode; the command can be executed outside the cluster, as long as the machine can access two masters at the same time; you can specify concurrency, you can also Specify the encoding format to solve some special encoding problems; when synchronizing, the necessary locks will be added to the source and target tables to ensure that there is no interlock with other SQL. If the lock fails, the table is considered Sync failed.

In addition, our command can be concurrent between ourselves and ourselves. That is, we called the command once, ran some tables, and found that it seems that the resources are still rich. At this time, we can start another command to synchronize other tables, and the above one. Like the command, this command is also a single file deployment, and all preparations are automatically completed within the command.

FAQ

This topic is very large and involves a lot of content, including various parameter issues, strategy issues, monitoring management, model design, system planning, and so on. Here, we share best practices on common operation and maintenance issues, all of which are derived from actual production experience. We hope that through these sharing, we can help you have optional reference standards in your development and use process.

About kernel parameters

  • The first part is mainly considered for some high concurrency scenarios, we need to increase some kernel restrictions. Usually after we change these parameters, if we do gpcheck, there will be an error. This is intentionally adjusted up, so we can ignore these errors.

  • The second part , for the sake of performance optimization, need to modify hugepage, blocksize, disk scheduler, etc. In general, we do not recommend directly changing the grub file, because this modification has the risk of the system not getting up. Usually we would suggest adding scripts to local files, this way is more secure. Sample code is given here.

Common database parameters

We pick a few important ones.

gp_external_max_seg This parameter , because the external table is loaded in parallel, when drawing from the ETL server or other databases, the external table may cause a lot of pressure on the supply end, especially the network pressure, which may cause some instability or even There will be some network errors. Therefore, in an environment where the network facilities are not too good, we need to reduce the parallelism because GP is too fast. Of course, if the environment is good enough, you may need to increase this parameter for speed.

gp_autostats_mode and gp_autostats_on_change_threshold are parameters about the collection of statistical information , and sometimes need to be adjusted, such as bulk data loading or inserting data into an empty table. Sometimes do not want the database to automatically collect statistical information, set gp_autostats_mode to NONE, or set to ON_CHANGE with another threshold parameter, the default threshold is too large, 2 billion (2 ^ 31-1), generally Not reachable, so if you use ON_CHANGE, you need to set a reasonable threshold, such as 5 million.

Instance number configuration recommendations

I often hear some friends asking questions in this regard on the Internet. How many instances of each machine are appropriate? It is how many Primary Instances a host has. There is no standard answer to this question, but we can refer to some information, for example, our DCA is 6.

For actual production scenarios, we need to adjust according to the applicable situation, such as the situation where the concurrency pressure is relatively large (the concurrency here refers to running batches, not small index queries), which can appropriately reduce the number of Instances on each Host, such as 4 are more common. Of course, if there are fewer cases of running concurrently, or simply stringing together to run, or the system is only used occasionally, then you can put a little more, for example, 12 is also possible. But it should not be too much, then the stability of the entire database operation will be affected.

Mirroring strategy

GP has two mirroring strategies by default, namely group and spread. The group method is to put the mirror of each Host on the next Host, and all the computing nodes form a ring. The spread method is to distribute the image of each Host to subsequent Hosts with the number equal to the number of Primary Instances on the current Host. This is actually two extremes, for example, each Host has 6 Primary Instances, group is 1 to 1, and spread is 1 to 6. Sometimes we are dissatisfied with either of these two. The group mode has the largest number of downtimes, but the performance loss is also the largest, the spread performance loss is the smallest, but the number of downtimes is also the smallest.

What we show here is that there are 6 Primary Instances on 1 Host, and Mirror is distributed on 3 other Hosts. Of course, the one-to-two relationship can also be designed, it is best to divide the relationship, otherwise it is more difficult to combine. This compromise method requires modifying the mirror_config file when executing gpaddmirrors. It should be reminded that there are many ports involved in this configuration file. If the meaning of the port is not clear, the modified file may report a port conflict error. You need to read the help information of gpaddmirrors carefully before modifying.

Statistical information collection

Accurate statistical information can help the planner to generate a more reasonable execution plan, so statistical information is very important. Generally speaking, if the statistical information is more accurate, in most cases, an execution plan that meets our expectations can be generated. It needs to be mentioned here that it is not that the statistical information is accurate, the execution plan must be accurate, or the best, because the statistical information is only a sample of statistical values, and it does not fully reflect all the details of the data, so sometimes we still need According to the specific situation, combined with the understanding of the database, through various execution plan parameters to intervene in the execution plan. The specific parameters will not be elaborated one by one. You can refer to the relevant documents, or there are many introductions in this respect in the previous articles of the Pivotal public account. Of course, as our orca becomes stronger, the situation that needs intervention will also become older. Less.

The two parameters of autostats were also mentioned earlier. If we want to control Analyze accurately, the best suggestion is to set gp_autostats_mode to NONE, turn off automatic collection, and perform precise control in scripts or applications. For fields returned only in the results, there is no need to collect statistical information. Save a lot of resources required for invalid Analyze.

It needs to be reminded that for the situation where the table is Truncate, the field-level statistics stored in the pg_statistic system table will not be lost, so unless you think that the amount of data in the table has greatly affected the field-level statistics Information, otherwise, you may not need to collect statistics on specific fields frequently. For the purpose of updating the information about the number of records in the table, it is recommended to collect statistical information directly on the system fields (even system fields that do not exist in the AO table). It is only to update the reltuples field information in pg_class. The principle is different from collecting user field statistics, which is an efficient method.

Garbage space recycling

For GPs before version 4.3, it is not possible for AO to do Update and Delete, so there is no garbage problem. This version has stopped service, we will not say more. For version 4.3 and later, the data modification mechanism of the AO table and the Heap table is different. The Heap table still continues the multi-version mechanism of the PG, and the validity of the data is marked by the system field. The AO table records invalid data through aovismimap. The Vacuum operation of the Heap table still needs to consume the value of max_fsm_pages, while the Vacuum of the AO table directly determines whether the file needs to be sorted according to the value set by the gp_appendonly_compaction_threshold (default 10%) parameter and the amount of invalid data recorded in the aovicimap. For the Heap table, the max_fsm_pages setting can not ensure absolute safety, because it is impossible to predict the upper limit of the garbage data in the Heap table, it is recommended to use Reorganize instead of Vacuum operation. In addition, Vacuum Full has an uncontrollable time risk and is also not recommended. For system tables, it is not possible to perform Reorganize operations because it does not have a Distribution strategy. It is recommended to perform Vacuum operations regularly, such as once a week. If it is maintained regularly, max_fsm_pages is sufficient.

AGE monitoring and management

It is recommended to regularly check the database AGE. Under normal use, the database is monitored for a period of time to obtain the growth trend of AGE, which can roughly determine the available time of AGE. It is recommended to sample the maximum AGE of all Databases on all instances of the entire cluster, so that it can help us see the situation of all nodes. After version 4.2.6, the Autovacuum feature is turned off and controlled by xid_warn_limit and xid_stop_limit. Once the XID warning appears, you need to perform a VACUUM FREEZE operation on all tables with too large AGE.

In later versions, the database template0 is not allowed to connect by default, so when reducing its AGE value, we need to modify the datallowconn attribute of the template0 database in pg_database to allow temporary connection to the database to complete AGE maintenance operating. We have also developed related commands for users in specific work to automatically complete these complicated tasks.

Database object limit

The limitation of the number of objects in the database mentioned here is not a technical limitation, but an empirical value. This means that trying to control the number of objects to a lower range will achieve better results. Generally, it is recommended that the number of objects should not exceed 100,000, and at the same time, the number of files in the database directory should not exceed 1 million. What is the impact of more than 100,000? Mainly affects data synchronization and even some queries. Because the number of files in the file system exceeds 1 million, the efficiency of the entire file system will decrease, affecting the performance of the database. In addition, the pressure on the system table will also increase. This problem can be considered in conjunction with the partitions and column storage mentioned later. The GP bottom table file is divided according to the size of 1G. Imagine that the size of 100,000 1GB files has reached 100TB, so if the table size is not too fragmented, it will generally not exceed 1 million files.

Physical model

The first point: row storage and column storage

There is no absolute boundary between the two and should be defined according to the actual situation. In general, you should choose the column storage carefully. If you use the column storage at will, it may cause serious file fragmentation. The bottom file is separated by 1GB as the upper limit. If the total size of a table is 100GB and the number of fields is 50, the system has 20 Primary Instances. If column storage is used, the average size of a data file is 100MB At this time, the situation is still normal. If there are 100 more partitions, the average size of the underlying files will be reduced to 1MB, which is not reasonable. Generally, the hardware performance of the GP is better, and the IO performance of the stand-alone is 2GB or even higher. Therefore, it is usually not necessary to store the underlying files in an excessively fragmented form.

The second point: data compression

Now many user tables, with larger data volume, will consider compression. As long as the compression level is not too high, it can effectively reduce the storage size and improve query performance. Even if we do an index query on the AO table, the query efficiency will not be much worse than the Heap table. We have done a comparative test and the performance of the two is slightly different, but this difference is insignificant compared to the difference in capacity. The commonly used compression option is zlib5, but if it is increased to zlib9, the compression effect may be increased by 10% or lower, but the performance of Insert and Select will be reduced exponentially. Sometimes quicklz is also a good choice, but its compression effect is not as good as zlib. Of course, its performance will be better than zlib. For specific production scenarios, you can do some comparative tests to obtain the most satisfactory compression options. Column compression is better than row compression, but it should be considered together with the table size. For example, historical data can consider this option.

Third point: distribution key

In general, it is recommended to choose a field commonly used in Join as the distribution key. What is the role of distributed keys? One is to distribute the data evenly to all Primary Instances; the other is to enable local calculations and minimize the motion of the data. For example, when doing Join, in order to reduce the redistribution of data, only when the associated field contains all the distribution key fields, can the Local Join be realized, and the movement of the data can be ensured. In order to increase this possibility, of course, the fewer fields, the better. Only single-element collections are the easiest to become a subset of other collections. But the empty set does not work, which means that the distribution strategy is Randomly, without any rules, it will face the most likely redistribution.

However, we need to have sufficient knowledge about random distribution. For large tables that are unlikely to be related to other large tables, if we can't find a suitable distribution key, sometimes we can use the Randomly strategy, at least. To ensure even distribution of data, when it joins with the small table Join, the execution plan will prefer to broadcast the small table. The choice of distribution keys is a more technical matter-at least before you understand the role of distribution keys. For the case where the distribution key is not specified and there is no PK when creating, GP will use the first field as the distribution key by default. In this case, there is sometimes a greater risk. You may consider turning on the gp_create_table_random_default_distribution parameter to at least ensure uniform distribution, but It should not be an excuse for unreasonably choosing distribution keys.

Fourth point: partition table

Our suggestion is not to design the table partition too fragmented. Generally speaking, the number of partitions for a table is almost tens or hundreds. Although we do not emphasize that the amount of data between partitions should be very balanced, it is also good if the partitions can be relatively balanced. For example, each partition in a table may be a partition that is often involved. There are 100 partitions, and the first partition has 90% of the data, which may not be very effective; however, sometimes we choose to put the hot data separately. A partition may have a small amount of data. Whether the specific zoning strategy is reasonable needs to be evaluated according to the specific situation.

In addition, it is generally not recommended to use multi-level partitions, multi-level partitions are difficult to maintain, and the number of data files may also be out of control. In theory, only 8-level partitions can be achieved, because an error will be reported when creating a 9-level partition, indicating that it already exists, because the GP partition table naming rules will use 8 characters [_Level_prt_Name] to separate, and count, 63 characters [ name length limit] will soon be used up.

Fifth point: Index

In most cases we don't use indexes. However, for OLTP-like scenarios, such as the need to check hundreds of records from billions of data based on fields like primary keys, in this scenario, indexes can effectively improve query performance. At this time, we need to evaluate whether and How to use the index to help improve the performance of the query. Sometimes, we will build an index on the distribution key, because it is a waste to use all nodes for small queries in a large cluster.

Then talk about updating data with indexes, such as a billions of tables, adding tens of millions of new data every day. If you insert an index into the Insert at this time, the performance must be very poor. This should be encountered by many people. It is generally recommended to delete the index first and rebuild the index after the data modification is completed. However, sometimes the business may not be able to meet this operation time, because the query performance during the deletion of the index will be very poor, at this time we can consider table name switching or partition exchange. The former means designing two tables. When the standby table has prepared new data and built a new index, the names of the two tables are interchanged to minimize the time for business impact; the latter idea Similar, but it is a suitable scenario for partitioned tables. If our new data needs to be stored in the recent partition, we will prepare the data and index of the recent partition in the standby table first, and then perform partition exchange [Alter Table Exchange Partition] , The latter will also be more friendly to the situation of dependence on the table.

For indexes, it is not recommended to use constrained indexes. Although these constraints can help ensure the uniqueness of the data, they may cause serious performance problems. It is recommended to consider the guarantee of data uniqueness from the business level, for example, when running batches, to avoid duplicate data through the association of logical primary keys. This is worth the workload in terms of relative performance guarantee.

Table space

Table spaces were originally designed to use different physical disk devices, such as disks with different performances on the machine. If they are mixed into a table space, the performance of high-performance disks will be wasted. Sometimes, we can consider using SSD disks as the table space for hot data storage, and SATA disks as the table space for historical data storage, linking disk performance with the frequency of data usage and making the best use of it.

The number of objects in a database in the same table space is recommended to be less than 30,000. This is also due to the performance of the file system. Too many files in a directory will affect the performance of the file system. Due to the directory structure relationship of the table space, different directories are used under the same table space between different databases. In addition, table spaces are built on file spaces, and different table spaces under the same file space also use different directories. Therefore, if the purpose of designing the table space is for me to disperse the number of files, we can create a file space, and then create multiple table spaces on this basis, which is easier to deploy and maintain.

Temporary space

Sometimes the SQL calculation is too complicated and the memory is not enough to complete the calculation. At this time, you need to temporarily cache some files on the disk. This temporary file is stored in the $ data_directory / base / oid / pgsql_tmp / directory of each instance. We suggest to make some restrictions on the use of this space to prevent some inferior SQL, it may burst the temporary space. This has happened many times, so it is necessary to make some restrictions, such as not exceeding several times the total amount of memory [of course, the capacity of the system disk must also be considered]. Here we also mentioned the gp_workfile_compress_algorithm parameter, which can set temporary files for compression, which is useful for scenarios where the disk performance is not very good, but if your disk performance is particularly good [such as read and write bandwidth above 2GB], maybe You will find that turning on this parameter will make the performance worse, so this parameter also needs to be judged based on experience.

GP has 3 Join algorithms: Hash Join, Nestloop Join and Merge Join, which may involve the use of temporary space. Hash Join between large tables and Left Join large tables. Simply put, large tables are hashed. At this time, it is likely that the memory size is not enough to store all the data of the large table, and temporary space is needed, but the small table Left Join large table should be avoided. This method of rewriting can be obtained from the previous Pivotal public Understand the series of articles published [number of gradually escalating series]. It is also included in the booklet distributed on site.

Memory overflow

  • OOM, there are generally several reasons: wrong execution plan, such as large tables are broadcast; large SQL concurrency, memory is tight;

  • Calculations that consume excessive memory, such as windowing functions for very large tables;

  • Also, our parameter settings are unreasonable.

The consequences of OOM may cause a large number of SQL errors, Instance Down, and even affect the operating system.

In the case of particularly high concurrency, because the memory consumption is very large, although at the CPU level, everyone is grabbing the sharding time, but everyone runs at the same time, the data placed in the memory will not be lost, and the memory occupation is not sliced ​​according to time of. This situation should be avoided and concurrency can be appropriately controlled, such as from the business level or from the resource queue level. In addition, in terms of task execution efficiency, if a large query is particularly concurrency, you will find that their overall execution efficiency is declining, and the total time is longer. When dealing with running batches, we care about the number of tasks completed per unit of time, not how many tasks can be run at the same time. If 100 tasks are controlled by 10 concurrent executions in 1 hour, if 20 tasks are controlled for 2 hours, it is not cost-effective. There is also a lot of knowledge about the control of concurrency. Large tasks must be limited, and small tasks must be released.

Some solutions of OOM. One aspect is to reduce memory consumption. Another aspect is to adjust from the parameter level, try not to put too much memory pressure on the operating system. Of course, we can also add memory, but adding memory is also limited, and it is impossible to increase it at any time, so many times we have to think about first, is there any way to reduce memory overhead.

The value of gp_vmem_protect_limit may also be of concern to many people. The usual recommended range is 1 to 1.5 times the memory. In fact, in general, we still recommend it to be less than 1 time, or directly limit it to 0.9 or lower. Sometimes we set aside this parameter, we can also limit the resource queue to below 0.9.

Roles and permissions

In general, we recommend to manage permissions according to the role, assign the permissions of the table to the relevant roles, and then assign the roles to the users, which is easier to manage. When we add a new user, we do not need to authorize certain tables to him in large quantities, we only need to authorize these related roles to him. In addition, the permissions of the GP cannot be passed down. For example, a user only has the permissions of the Schema, which does not mean that he automatically has the permissions of all the tables in the Schema. The permissions of the tables need to be assigned one by one. operating.

Lock problem

The situation of SQL lock is sometimes more common. Generally speaking, we will look at the situation of active SQL. In addition, we will look at the situation of lock. Here is a list of methods we used to check SQL interlocking. Some filtering has been done to show only the possible interlocking situations. Because the association is made by the object's oid, all the locking information will be associated, but some are not interlocking. Those situations where there is no lock conflict with each other are not our concern.

The SQL in this example needs to be executed in a specific database. Because it is associated with pg_class, it is used to display the name of the object where the lock is located. Pg_class is a non-global object. If you don't care which object the lock is on, you can modify this SQL slightly to remove the association of pg_class, and you can see the global interlock information.

For the handling of interlocking situations, generally use pg_cancel_backend or pg_terminate_backend to clean up. But it is not recommended to use the command line to kill the database process, this method is more dangerous.

Common Operation and Maintenance Commands

gpstart is used to start the database . Here we mention that -m is used to start only the Master instance. This mode will not start the node instance. Similarly, if we use gpstart -m -d $ data_directory to operate the node Instance, we can also start a single one. Instance, this may be more useful when dealing with cluster failures.

gpstop is used to stop the database , -m is used to stop only the Master instance, but unlike gpstart, this method is not very effective when operating node instances, but fortunately, it is not complicated to stop a postgres with pg_ctl stop -D $ data_directory . The -M parameter is used to specify the mode of parking, there are 3 options fast, immediate, smart, the most commonly used is fast, this way will roll back uncommitted transactions, smart may be too weak, immediate and too crude, usually They are not very commonly used. Here is a little trick. The corresponding abbreviations for -M fast, -M immediate, and -M smart are -f, -i, -s, so many times, we need to stop the library and use gpstop directly- af is enough.

gpstate view the health status of the cluster

  • -f parameter is to check the synchronization of Standby

  • -e View the synchronization between primary and mirror

  • -m view mirror status

  • -s View detailed status of the instance

Instance configuration information, view through gp_segment_configuration system table, view historical downtime and recovery information through gp_configuration_history, view file system information of file space through pg_filespace_entry system table, query the size of objects in the database through pg_relation_size function, query the size of database through pg_database_size function Through the following SQL, you can view the number of records in a table in different Instances, used to determine the tilt of the table:

The following SQL can be used to quickly and intuitively view the size of a table on all nodes. Although it cannot reflect such details as the number of records, it can also help us quickly and intuitively determine the tilt of the table. In many cases, we recommend this The method is especially useful for tables with larger sizes, or when we need to perform tilt analysis on all user tables:

For example, we count all user tables:

After that, we can perform various operations on the skew_statistic table to find out the tilt problem.

gpconfig is used to modify the parameters in the postgresql.conf file . This is a system-level modification, especially the parameters that take effect after many database restarts. All of these parameters need to be modified in this way to achieve the goal of uniform system-wide modification.

gpssh is a very common cluster maintenance command that can operate multiple machines at the same time . Here, some versions add a -d parameter to this command. The purpose is to set an interval when issuing commands to different machines. The default is 0.05 seconds, sometimes we may use gpssh to check the system time in the cluster. Due to the existence of this parameter, there is an additional deviation between all machines. If this happens, we can manually set -d 0.

gpscp is a commonly used cluster copy command . For example, we need to overwrite some configuration files in the cluster. Generally, sysctl.conf, limit.conf, rc.local, ntp.conf, sshd_conf, hosts, etc. can be overwritten, but do not overwrite The grub file, which is the reason why the grub file is not changed as far as possible, because the hardware ID in it is modified will cause the system to fail to restart.

gpcheckperf is used to check network and disk performance . When the cluster size is large, if you want to conduct a matrix test of the network, you may need to modify the MaxStartups configuration of sshd_config, because its default configuration will cause an ssh error when performing a large-scale cluster matrix network performance test, and this test is It is necessary because the real usage environment is matrix network traffic. Regarding performance indicators, disk IO should reach or far exceed the average one disk of 100MB per second. With network matrix testing, each 10 Gigabit network card needs to exceed 1GB per second, otherwise it will not meet the performance expectations of the hardware configuration.

Daily inspection and troubleshooting

Daily inspection

It is recommended to check the system regularly. For the health status of the system, it can be one or more times a day, and the fault should be resolved as soon as possible. The inspection contents include: regularly check the number of user tables, the synchronization status of cluster instances, whether there is over-long SQL, whether there is residual SQL of nodes, space usage status, serious errors such as Panic and OOM in the log, disk and RAID Whether the status is normal, etc.

Let ’s talk about RAID here. In general, we recommend that the Write strategy of the RAID is set to Force WriteBack to ensure the performance of the disk write, because the BBU-protected RAID will be downgraded from WriteBack to WriteThrough during battery maintenance or abnormal situations Decreased by an order of magnitude. Setting Force WriteBack is based on a common assumption: the production room will not be accidentally powered off for no reason.

identify the problem

There are often some problems such as resources, the use of system resources, and the health of the database, which will affect the health of the entire database. These problems must be ruled out first.

In general, we will check the operation of SQL, including some historical SQL , which can be analyzed from the gpperfmon library. We can also use system commands to track some system processes, analyze the root cause of the problem, and even say whether there are bugs.

You can also use the gpssh tool to view the database processes on all nodes. For example, some node tasks have been idle, while other nodes are still running, whether there is data or calculation tilt, whether there is a lock problem, whether there is a sudden space surge Situations such as Descartes led.

Sometimes we even need to analyze the SQL execution plan, judge the quality of the execution plan, find the cause of the problematic SQL and find a feasible solution.

Emergency measures

Sometimes the state of the database may not be normal, and a restart operation is required. It is recommended to perform a checkpoint operation every time before closing the database. When doing abnormal maintenance, you can use gpstart -R to enter the restricted access mode. At this time, only SuperUser can access to facilitate system diagnosis and maintenance. Sometimes there may be Persistent or xlog problems, but to use pg_resetxlog and persistent rebuild operations with caution, it is best to operate with the support of Support, otherwise more serious problems may occur.

The faulty machine is isolated. When there is a serious hardware failure on a node, it should be isolated. For example, shutdown, the Primary will switch to the corresponding Mirror to continue working, and then restore it to the cluster after the faulty machine is repaired. If the disk file has not changed, use incremental synchronization. If the disk file has changed, such as a disk failure, and data is lost during repair, full synchronization should be used. In addition, you can use other new machines to replace the faulty machine, such as setting the same IP and host name, the same system parameter settings, installing GP software [actually, a lot of work can be done in advance, as a cold standby machine], re Perform key exchange, and then perform gprecoverseg operation. You can also use the gprecoverseg -p command to replace the failed node with another machine with a completely different IP and host name.

Repair the wrong page through the Vacuum system table. When it is found that the system table has a slow query, you can perform the Vacuum operation on the system table once. Vacuum itself has a certain repair function. However, as mentioned earlier, we should configure scheduled tasks to perform Vacuum Analyze maintenance on all system tables at regular intervals [such as a week] to ensure that the system tables continue to be healthy.

This article cannot explain all the problems, such as some principles that need to be observed during the development process, SQL tuning experience, etc., Pivotal's series of technical articles mentioned a lot of experience in this regard.

In general, the concept that GP is an MPP database needs to be clear to people who write SQL. It is not possible to use GP according to the original idea of ​​using a database. Try to let all instances perform a large SQL task together, such as Cursors need to be avoided, and data modification needs to be operated in batches, and one by one operations must be eliminated.

The author of this article is Chen Miao, a senior architect of Pivotal big data. The PPT is provided by Li Wei, the chief architect of Pivotal big data. To download, please click to read the original text.

Published 13 original articles · Like1 · Visits 207

Guess you like

Origin blog.csdn.net/murkey/article/details/105605360