SQL data synchronized to the ELK (IV) - by using SQL SERVER Track Data related to synchronize data (in)

First, the relevant documents

The old rules, in order to avoid my explanation mislead you, please be sure to understand the wave of SQL SERVER related functions through the official website.

Documents Address:

Overall description document: https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-tracking-sql-server?view=sql-server-2017

Change Data Capture:https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-2017

Change Tracking:https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-tracking-sql-server?view=sql-server-2017

Poor English friends can en-us URL is changed to zh-cn Chinese view documents

Second, Features

Built-in SQL SERVER provides two mechanisms grab data changes, called Change Data Capture (hereinafter referred to as CDC), another called Change Tracking (hereinafter referred to as CT). When these two functions can perform DML operations (insert, update, delete) the user to record change data.

Their working principle is that when the data table operations, SQL SERVER will record the transaction log, if you are any of the above two functions enable a, uses SQL SERVER SQL SERVER Agent (a separate program) to capture these logs and records to a specific table (so the program will have additional storage space and server performance overhead), the final SQL sERVER provides a set of functions to help users resolve these record list, of course, there are ways these changes can be read directly to the record.

Note that these functions in 2014 and the following version requires Enterprise or Developer Edition will have this feature, in the above SQL SERVER 2016 version, the standard version is also built in this function.

Advantage:

  1. System built without a custom solution
  2. Data Sheet structure does not require adjustment, no need to add something like an identity column
  3. CDC has a built-in data clearing mechanism for LOG after the expiration does not require custom clearance mechanisms
  4. The program is asynchronous, although the server will affect performance, but after all processes are independent, this effect than the direct impact of the use of a flip-flop is smaller ( do not know there is no plan to deploy the agent to SQL SERVER separate machine on big brother can know that the next )
  5. Change is a committed transaction, change the order of time is based on the transaction commits, change the order of the program must be acquired reliable.
  6. SQL SERVER provides several tools to configure and manage the

working principle:

Here is mainly based combat, so just put two pictures of you to the official website of feeling ~

Change data capture data flow

the difference:

The main difference between these two functions is that the format of recorded data, CDC in more detail some of the, he will record the details of each record every change, that is, before and after the change, the value of each field data. The CT is just a record, this record has been changed at the contents before and after specific changes will not be recorded.

Specific content recorded below will give you a detailed description, please Shaoanwuzao ~

Third, preparation

Open the related functions

In addition to this article, refer to blog articles written by other people garden:

https://www.cnblogs.com/maikucha/p/9039205.html

https://www.cnblogs.com/chenmh/p/4408825.html

1. Add a dedicated group of files

Right on the need to record data changes -> Properties -> File group, click the Add files group, add a file named group of TDC.image

2. Add the database file

Switch to the document Tab page, and then click the Add button to create a new file, the file type selection rows of data, select the file you just created a good group TDC file group.

This step is what I learned from other bloggers over there, there is no official document the first two steps, of course, you can ignore these two steps, but my understanding is that this is to avoid a two-step primary process to seize and SQL SERVER mdf file resources, if the primary process and use the same file, can cause performance problems and concurrency problems, specifically, before the official on the PRD, do some tests.

1565873259(1)

3. Enable SQL SERVER Agent

In the windows service found inside the SQL Server Agent service, click the Start (if necessary, set at startup), as follows after the last display on the MSSQL database connected:

image

4. Enable the database level related functions

These data change tracking feature is turned off by default, and when using these functions, you first need at the database level to enable these features.

CDC enabled database functions required to perform the following SQL:

USE MyDB  
GO  
EXEC sys.sp_cdc_enable_db  
GO

CT enabled database functions required to perform the following SQL:

ALTER DATABASE JaxTest (database name) 
the SET CHANGE_TRACKING = the ON   
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP the ON =)

You can also right on the database -> Properties -> Configuration Change Tracking page:

image

5. Table level enable web

After the completion of the database level is enabled, also need to be enabled for it at the table level, enable the process is as follows:

Let's create a table:

CREATE TABLE Person
(
    Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    Name NVARCHAR(32) NOT NULL,
    Age INT NOT NULL,
    Remark NVARCHAR(512) NULL
)

启用CDC需要执行下面的SQL:

exec sys.sp_cdc_enable_table 
    [ @source_schema = ] 'source_schema', ---表所属的架构名,一般是dbo
    [ @source_name = ] 'source_name' ,----表名
    [ @role_name = ] 'role_name'---是用于控制更改数据访问的数据库角色的名称。
    [,[ @capture_instance = ] 'capture_instance' ]--是用于命名变更数据捕获对象的捕获实例的名称,这个名称在后面的存储过程和函数中需要经常用到。
    [,[ @supports_net_changes = ] supports_net_changes ]---指示是否对此捕获实例启用净更改查询支持如果此表有主键,或者有已使用 @index_name 参数进行标识的唯一索引,则此参数的默认值为 1。否则,此参数默认为 0。
    [,[ @index_name = ] 'index_name' ]--用于唯一标识源表中的行的唯一索引的名称。index_name 为 sysname,并且可以为 NULL。如果指定,则 index_name 必须是源表的唯一有效索引。如果指定 index_name,则标识的索引列优先于任何定义的主键列,就像表的唯一行标识符一样。
    [,[ @captured_column_list = ] 'captured_column_list' ]--需要对哪些列进行捕获。captured_column_list 的数据类型为 nvarchar(max),并且可以为 NULL。如果为 NULL,则所有列都将包括在更改表中。
    [,[ @filegroup_name = ] 'filegroup_name' ]--是要用于为捕获实例创建的更改表的文件组。
  [,[ @partition_switch = ] 'partition_switch' ]--指示是否可以对启用了变更数据捕获的表执行 ALTER TABLE 的 SWITCH PARTITION 命令。allow_partition_switch 为 bit,默认值为 1。

上面的内容可能有点啰嗦,举个实际例子吧,比如我要对Person这张表启用CDC,则执行的SQL如下:

EXEC sys.sp_cdc_enable_table 
    @source_name = 'Person',
    @source_schema = 'dbo',
    @capture_instance = 'dbo_Personal',
    @filegroup_name = 'TDC',
    @supports_net_changes = 1,
    @role_name = NULL


启用CT需要执行下面的SQL:

ALTER TABLE dbo.Person(表名)
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON)

当然,也可以在数据表上右键->属性->变更跟踪 Tab页中进行启用。

到这里为止,就已经启用了数据库的CDC和CT两个功能,当然,实际大部分情况下,只需要根据需要,选择其中一种即可,这里只是都做一个说明。你可以只挑一个来进行实践。

使用CDC和CT功能进行变更抓取

1.使用CDC进行变更抓取

在我们先向表中插入一些数据,然后再修改、删除插入的这些数据,再使用SQL SERVER提供的相关SP来抓取这些变更。

本文中的数据变化过程如下:

首先新增三条数据:

image

然后修改成下面这样子:

image

最后再把第二条删掉:

image

此时,我们先使用CDC的相关脚本来查询所有变更:

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Personal');  
SET @to_lsn   = sys.fn_cdc_get_max_lsn();  
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Personal
  (@from_lsn, @to_lsn, N'all update old');  
GO

这段脚本中有两个地方用到了dbo_Personal这个名字,这个名字其实是在上面启用CDC的时候,指定的@capture_instance = 'dbo_Personal', 这个参数,如果你已经忘记了,可以翻到博客的上面回顾一下~

如果你已经忘记你执行的时候指定的这个参数名字,可以在DB的Function列表中找到它,都是以cdc.fn_cdc_get_all_changes开头的。

执行脚本后,会得到如下结果:

image

调用这个Function时候的参数含义和返回的每一列的含义可以参考微软官方文档:https://docs.microsoft.com/zh-cn/sql/relational-databases/system-functions/cdc-fn-cdc-get-all-changes-capture-instance-transact-sql?view=sql-server-2017,下面也给懒人朋友们截个图。

image

image

从这个LOG中,其实我们已经可以获得非常详细的我们每一次对Person这张表的操作了,而且可以发现,微软的这个顺序也已经是按照我们执行的SQL语句的顺序进行排列了,每一个字段每次的变更前后也记录的非常的清楚了。

此外,对于CDC,也可以抓取净变更记录,即再一段时间内,数据差异,并且把反复修改的中间过程会过滤掉,比如把某条记录的某个字段从A改成B,又从B改成A,这时候就会被忽略掉这个修改:

我们可以执行下面的SQL来抓取净变更:

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Personal');  
SET @to_lsn   = sys.fn_cdc_get_max_lsn();  
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Personal
  (@from_lsn, @to_lsn, N'all ');  
GO

最终得到的结果如下:

image

可以看到,对于Id为2的那条数据,是没有体现在这里的,因为他在这个过程中,是从新增变为了删除,相当于是没有变化的,所以这个函数获取出来就没有那条记录~

The function of the relevant parameters and returns the column meanings refer to: https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/cdc-fn-cdc-get-net-changes-capture -instance-transact-sql? view = sql-server-2017


2. CT changes crawl

Use CT changes crawl requires the following SQL:

SELECT *
FROM  CHANGETABLE(CHANGES dbo.Person,0) AS CT

For the above recording operation, the final result will be the following:

image

We can see the results of CT record is very simple, which he would record ID has changed, as to what the content of the change, he will not be recorded, but he'll tell you, if you want to synchronize this change to another place, operation requires using the Insert, Delete or Update (SYS_CHANGE_OPERATION columns), of course, there are many advanced usage, we need to continue to explore.


summary

This article is mainly about how to use the change of CDC functions to capture data, in fact, said that overall is relatively shallow, one of my own understanding of this is useless so deep, on the other hand is the article space constraints, the focus of this article nor to the various uses of these things make it clear, we have only one purpose, that is, the data is synchronized to SQL SERVER ES. So the next article we will use these features when it comes to today, combined with a number of other functions, to try to import the data into the ES.

It was late, go to bed Bao hair ~

Guess you like

Origin www.cnblogs.com/baiyunchen/p/11361372.html