[Database] Synchronization method: DataGuard vs GoldenGate

Get into the habit of writing together! This is the first day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

1. Basic Concepts

1.1 Data writing process

1.png       In a database, the data files store the actual data, but when the data is updated, the data files on disk are not updated directly. Because this involves a lot of disk I/O, the performance impact is huge. When the user submits a DML request, such asupdate emp set ename='test' where id=1;

      When the listener receives a request, it will create a corresponding server process for the request. The server process first scans the database buffer to check whether it contains the data block with id=1. If there is, the cache will be updated directly, and the data block will become dirty; if there is no hit, the server process will copy the corresponding data block from the disk to the buffer first, and then perform the update operation.

      If the blocks stored in the database buffer are inconsistent with the blocks on the disk, the buffer is called a "dirty buffer", and the dirty buffer will eventually be written to the data file on disk by the database writer (DBWn). DBWn does not write in real time, it will write as little as possible, and it will only write in the following four cases:

  • a. No buffer available
  • b. Too many dirty buffers
  • c. 3 seconds timeout (a write will be performed in 3 seconds at the latest)
  • d. Encountered a checkpoint, that is, checkPoint. A checkpoint is an Oracle event. For example, there will be a checkpoint when the database is shut down in an orderly manner, or it can be triggered manually.

  Since DBWn is not written in real time, if the system is suddenly powered off and the data in the buffer has not been written to the disk, wouldn't the committed data in the buffer be lost? In fact, the data will not be lost, which leads to the concept of redo log (redo log).

1.2 What is redo log (redo log)

2.png

      When the user performs a DML operation and the data block changes, the resulting change vector is written to the redo log file. With these records, when the system suddenly shuts down due to factors such as power failure, and a large amount of dirty data in the database buffer has not had time to be written to the data file, it can also be restored through redo log.

      相应的,日志写入也有一块内存区域叫日志缓冲区, 由日志写入器(LGWR)把日志缓冲区内的内容写入到磁盘的重做日志文件中。相比数据库写入器(DBWn),LGWR的写入要频繁的多。在下面三种情况LGWR会执行写入:

  • a. commit时写入
  • b.日志缓冲区的占用率达到1/3。
  • c.DBWn要写入脏缓冲区前

  当用户执行DML并且commit后,不一定会触发DBWn写入,但一定会触发LGWR写入。因此就算执行DML并且commit后,数据只写入了数据库缓冲区,而此时数据库缓冲区中的数据丢失了,也可以通过redo log恢复。

1.3 什么是归档日志(archive redo log)

3.png

      所谓归档,就是将重做日志文件(redo log)永久保存到归档日志文件(archive redo log)中。archive redo log和redo log作用是一样的,只不过redo log会不断被重写,而archive redo log则不会,它保留了关于数据更改的完整的历史记录。 由ARCn进程负责将redo log备份到archive redo log。

      以上就是数据写入的大体流程,数据库同步主要就是依赖重做日志(redo log)和归档日志(archive redo log)完成的。

二、容灾同步方式

2.1 Oracle Dataguard

原理

  将redo log从原数据库传输到目标数据库,然后在目标数据库上应用这些日志文件,从而使目标数据库与源数据库保持同步。 4.png 执行步骤

  1. 客户端连接主库,发起更新数据的操作
  2. 在主库内存中完成数据的更改,生成redo log
  3. 客户端执行commit
  4. 主库将重做日志写入日志文件组,同时发送给备库
  5. 备库应用重做日志,应答主库
  6. 主库应答客户端

  通过这个机制,实现每个在主库的更新操作,都会在备库应用,从而实现同步。对于备库的应答,可以设置最大可用模式(Maximum Availability)、最大性能模式(Maximum Performance)和最大保护模式(Maximum Protection)三种方式。在最大保护模式下,主库需将redo log传输到备库后才可完成操作,从而更好地保证一致性。

      容灾系统常用的ADG( Active Data Guard),则是在DG的基础上,加入了可查询的功能,从而能够将一些报表类的业务压力从主库上分离出去。

保护模式 应答机制
最大性能模式(Maximum Performance) 主库不需要等待备库的回应即可应答客户端,主库与备库数据是异步的,主库出现故障会有数据丢失。如果备库出现故障,主库保持可用。对性能基本无影响。
最大保护模式(Maximum Protection) 主库必须确认备库已经应用重做日志才能应答客户端,主库与备库数据是实时同步的,主库出现故障不会有数据丢失。如果备库出现故障,整个DG架构不可用。网络、备库读写速度等会影响整体性能。
最大可用模式(Maximum Availability) 如果备库正常运行,则与最大保护模式一致,如果备库故障,则与最高性能模式保持一致。

备库类型

5.png

物理备库 逻辑备库
原理 主备物理结构一致(逻辑结构也一致),备库利用主库发送的redo log进行重做来同步 主备逻辑结构一致(物理结构不一致),备库利用主库发送过来的redo log重新解析为SQL语句,通过执行SQL语句来同步
优点 备库逻辑结构、物理结构均和主库保持一致,更强的一致性保证 更灵活,可以使用DBMS_LOGSTDBY包对备库做特殊设置

物理备库与主库SCN保持一致,逻辑备库只需要数据保持一致。

2.2 Oracle GoldenGate方式

原理

  通过解析源数据库redo log 或者archive log获得数据的增量变化,然后通过TCP/IP投递到目标数据库,最后将这些变化解析还原应用到目标数据库,从而实现两边同步。DSG复制工具也是基于相似原理。 6.png

执行步骤

  1. 通过抽取进程(Extract Process)和复制进程(replicat process)进行初始化数据同步
  2. 抽取进程(Extract Process)在源端数据库中读取online redo log或者archive redo log,然后进行解析,只提取其中数据的变化信息,比如DML操作 
  3. 将抽取的信息转换为OGG自定义的中间格式存放在队列文件(trail file)中  
  4. 通过传输进程将队列文件(trail file)通过TCP/IP传送到目标端,可配置投递进程(Pump Process)将trail file文件以数据块的形式传送到目标端
  5. 目标端Server Collector进程接收从源端传输过来的数据变化信息,把信息缓存到队列文件trail file中 
  6. 目标端复制进程(replicat process)从trail file中读取数据变化信息,并创建对应的SQL语句,应用到目标数据库,提交成功后更新自己的检查点,记录已经完成复制的位置,完成数据复制

三、总结

DataGuard OGG
原理 将redo log从源数据库传输到目标数据库,然后在目标数据库上应用这些日志文件 通过解析源数据库redo log 或者archive log获得数据的增量变化,然后通过TCP/IP投递到目标数据库,最后将这些变化解析还原应用到目标数据库
稳定性 作为灾备的稳定性极高 由于数据复制操作独立于数据库管理系统,因此不能确保数据零丢失
维护 维护简单,极少出现问题 命令行方式,维护较复杂
对象支持 完全支持 部分对象需手工创建与维护
目标端可用性 目标端处于恢复或只读状态 两端数据库是活动的,目标端可以提供实时的数据查询,也可以实现两端数据的同时写入
复制方式 可以实现实时复制 GoldenGate can provide real-time capture and delivery of a large amount of data at the second level, asynchronous replication, which cannot achieve synchronous replication
resource occupancy Replication is done through the LGWR process or ARCN process of the database, occupying a small amount of database resources During peak business hours, more system resources are consumed during data extraction and conversion, and less during low peak hours.
Heterogeneous database support It only runs on the Oracle database, and the source and target operating systems have stricter requirements Data replication can be performed between databases of different types and versions, as well as synchronization between different operating systems

      DataGuard is a better way to achieve disaster recovery of Oracle database. We also tried to use OGG for disaster recovery in our project, and mainly encountered the following problems:

1. Guarantee the data consistency of the databases at both ends

      DataGuard can realize real-time replication, and its consistency is guaranteed by the database itself. The data replication operation of OGG is independent of the database management system. It can only ensure that the changed data extracted by it is applied to the target end, but it cannot ensure that the databases at both ends must be consistent. Although OGG currently has few problems with DML replication, we have written a number of additional audit scripts to ensure consistent data on both ends. Through the audit, it is also found that the OGG itself does not report an error, but the table data is inconsistent.

2. Object maintenance

      Some objects need to be created and maintained manually. To ensure the consistency of the objects, additional scripts need to be written to audit database objects.

3. Efficiency issues

      If the source table is not indexed and a large number of additions, deletions and modifications are performed, the efficiency of the OGG replication process will be greatly affected. For example, if one of our tables is not indexed, 480,000 pieces of data are updated and deleted in batches, the replication process is delayed for 6 hours, and 150,000 pieces of data are applied on the target side. Finally, it takes 19 hours to match OGG. Similarly, if you trancate at the source, it will also cause a lot of data changes, which will affect the efficiency of OGG.

      In general, if both ends are ORACLE, and the operating system is the same, DataGuard is the better way. However, if you want to synchronize data between heterogeneous databases, or if the operating systems at both ends are different, you can only choose the OGG method. Using this method requires more effort to ensure data consistency and improve efficiency.

Guess you like

Origin juejin.im/post/7083402649961250847