Build PostgreSQL physical replication high-availability architecture database service in Windows environment

First, some basic differences between logical replication and physical replication are introduced:

  • Physical replication requires that multiple instances have the same major version and operating system platform. If the master instance is PostgreSQL15 in Windows environment, the slave instance must also be of this environment and version. There is no requirement for logical replication.
  • Physical replication is to directly transfer WAL archive files, and replay execution from the instance, which can be understood as real-time WAL archive recovery, so the delay is low and the performance is high. ,
  • Logical replication can be simply understood as parsing the information in the WAL archive file, processing it into a standard SQL statement, and passing it to the repository for execution. Compared with directly transferring WAL, the performance is lower and the delay is higher.
  • Physical replication does not need to manually create databases and data tables like logical replication. Because physical replication directly restores WAL, it includes DDL operations, while logical replication needs to perform DDL operations by itself.
  • Logical replication is more flexible. You can specify the library to be copied. From the instance, you can also create other libraries for other businesses, while physical replication is for the entire instance. The 100% consistency between the slave instance and the master instance can only be performed at most. read operation.

For the installation method of PostgreSQL on Windows system, you can directly refer to the previous blog  PostgreSQL manual installation and configuration method on Windows system

If high performance is pursued, a highly consistent database replication backup solution is recommended to use physical replication.

To build a master-slave subscription in physical replication mode, you must first adjust the postgresql.conf file of the master instance
wal_level = replica
synchronous_commit = remote_apply

Because the synchronous_commit = remote_apply mode we use   is a synchronous replication mode, which can be understood as synchronous replication. After the client commits a transaction like the master instance, it needs to wait for all the nodes configured in synchronous_standby_names to complete remote_apply and receive the data before the master instance will Return the status of the successful submission of the transaction to the standby database. After creating a subscription named s, we open the postgresql.conf file of the primary instance again to adjust and set
synchronous_standby_names = 's'

synchronous_standby_names can also use the following configuration mode when multiple slave instances are synchronized from the master instance

  • synchronous_standby_names='s1'  means that the s1 standby machine can be submitted after returning.
  • synchronous_standby_names='FIRST 2 (s1,s2,s3)'  means that the first two s1 and s2 of the three standby machines s1, s2, and s3 can be submitted after returning to the main instance.
  • synchronous_standby_names='ANY 2 (s1,s2,s3)'  means that any two of the three standby machines s1, s2, and s3 can return to the main instance and submit.
  • synchronous_standby_names='ANY 2 (*)'  means that any two standby machines in all standby machines can return to the main instance and submit.
  • synchronous_standby_names='*'  means to match any host, that is, any host can return and submit.

One thing to note here is that this is a known problem of PostgreSQL during synchronous replication. Assume that a primary instance and a standby database s1 adopt synchronous mode, and then synchronous_standby_names is configured as synchronous_standby_names='s1', although from the configuration point of view it seems that the data It must be submitted to s1 and s1 responds successfully before the primary instance will return a successful transaction operation response to the client. However, in reality, when the standby database is down, when the primary instance receives a transaction operation, it will When waiting for the return of the s1 standby database, because the s1 database has been hung up, this operation will definitely time out. When the communication between the active and standby nodes times out, the primary node will still return the command that the transaction was successfully submitted to the client, and the operation of the client will still be successful. At the same time Because each transaction operation has to go through this timeout process, all transaction operations on the client will be relatively stuck.

For example, each insert will go through the communication timeout process between the primary instance and the standby database, so each insert action takes about 30 seconds to complete, which will cause the application to be very stuck. At this time, it is equivalent to the main instance running in a (very stuck) independent mode. This situation will return to normal after the standby database is back online (if the standby database cannot be restored in a short period of time, you can adjust the synchronous_standby_names setting of the main instance to remove the The transaction of the s1 standby database is waiting for verification, and the instance will not be stuck after changing to the single-database operation mode to restart the instance), but it should be noted that when the primary instance runs independently from the standby database, if a disaster occurs in the primary instance at this time, such as the hard disk is broken, then Data loss will occur. Therefore, it is recommended to have at least 2 slave instances to increase the guarantee level.

Then you need to adjust the pg_hba.conf of the main instance, and add the connection whitelist configuration of the replication mode.
host replication all 0.0.0.0/0 scram-sha-256

Remember to restart the master instance after adjusting the configuration file.

After the master instance is restarted, we also need to connect to the master instance to create a replication slot. By default, the WAL archive file is cleaned up in a circular manner, which will cause a problem. If our slave instance hangs up and goes offline for a long time, it may be due to The WAL file of the main instance has been deleted in a circular manner. In this case, even if the secondary instance is restored and then goes online again, because part of the WAL archive files of the primary instance have been cleaned up, it cannot catch up with the data progress of our primary instance. will report an error directly. Because of the existence of this scenario, a concept of replication slots appeared in PostgreSQL. The master instance can create multiple replication slots, and one replication slot is bound to a slave instance. The advantage of the replication slot is that it will ensure that the slave instance can get WAL The file will be cleaned up later, and there will be no problem with the automatic cleaning of the scrolling cycle mentioned above.

The maintenance of the replication slot is performed on the main instance: the statements for creating, querying, and deleting are as follows
Create a replication slot
SELECT * FROM pg_create_physical_replication_slot('slot1');

Query all replication slots
SELECT slot_name, slot_type, active FROM pg_replication_slots; slot_name | slot_type | active

delete replication slot
SELECT * FROM pg_drop_replication_slot('slot1')

At this point, the configuration of the master instance is complete. The next step is to prepare our slave instance. You can stop the master instance directly, and then pack and compress the PostgreSQL folder and Data as a whole and copy it to the new server to start it as a slave instance. .
Here I choose to directly pack and compress the PostgreSQL on the cloud server and then copy it to the local decompression as a slave instance



After local decompression, the following adjustments need to be made as a slave instance, postgresql.conf
primary_conninfo = 'host=xxxx port=5432 user=postgres password=xxxxxx application_name=s'
primary_slot_name = 'slot1'


The main content of primary_conninfo  is the connection string information of our primary instance and then add an  application_name  , application_name is associated with the synchronous_standby_names  we configured on the primary instance earlier   . We configured all the transaction operations of the primary instance before and need to wait for the standby database named s synchronously After execution,
primary_slot_name  is the name of the replication slot. We created a  replication slot of slot1 earlier  for our slave instance to use.

One thing to note here is that if there are multiple slave instances during configuration, one slave instance corresponds to one replication slot and is bound to one application_name. Then create an empty file standby.signal
in the data directory


This file is actually a signal mark, which identifies our current instance as a read-only instance and cannot be used for data insertion.
Then start the standby database. Normally, you will see the following interface

At this time, we can try to create a database on the master instance to do some operations, and then connect to the slave instance, and we will find that both sides are synchronized with each other.
 

If you want to disassociate the slave instance from the master instance, the operation is as follows: Find synchronous_standby_names
from the master instance's postgresql.conf   and delete the configuration of the s node #synchronous_standby_names='s' If there is only one slave node, then directly add # to comment on synchronous_standby_names Restart the master instance after adjustment.


Then open the postgresql.conf of the slave instance, comment
#primary_conninfo
#primary_slot_name to configure the node information, delete the standby.signal
file in the data directory   , and restart the slave instance.

So far, the explanation of building PostgreSQL physical replication high-availability architecture database service in Windows environment has been explained. If you have any questions, you can comment below the article or private message me. Active discussions and exchanges are welcome.

Guess you like

Origin blog.csdn.net/weixin_47367099/article/details/127655878