1. What is Change Data Capture (
CDC
)Change Data Capture (
CDC
) usesSQL Server
agents to record insert, update, and delete activity applied to tables. This provides details of these changes in an easy-to-use relational format. Column information is captured for modified rows, along with the metadata required to apply the changes to the target environment, and stored in a change table that mirrors the column structure of the tracked source table.
Table-valued functions are provided so that consumers can systematically access change data. A typical example of a data consumer targeted by this technology is an Extract, Transform, and Load (ETL
) application.ETL
Applications incrementally load changed data fromSQL Server
source tables into a data warehouse or data mart. While the representation of source tables in a data warehouse must reflect changes in the source tables, the end-to-end technique of refreshing a copy of the source does not apply. Instead, you need a reliable stream of change data with a specific structure that consumers can apply to different target data representations.SQL Server
Change data capture provides this technology.2. Change the main data flow of data capture
The change data source for change data capture is theSQL Server
transaction log. As inserts, updates, and deletes are applied to the tracked source table, entries describing these changes are added to the log. The log is used as an input source for the capture process. It reads the log and adds information about the change in the tracked table's associated change table. Functions are provided to enumerate the changes that occurred within a specified range in the changes table and return that value in a filtered result set. Typically, the application process uses the filtered result set to update the source representation in some external environment.3. Necessary conditions for enabling CDC
3.1 Available in Enterprise Edition, Development Edition and Evaluation Edition of SQL server 2008 and above
Version check: The installed version of SQL Server is the 2019 development version3.2 Proxy service (job) needs to be enabled
3.3 CDC requires additional disk space other than the business library to save log files
3.4 The table must have a primary key or a unique index
The column ofDB_Student
the table of the database has been set as the primary key.tb_teacher
teaid
4. Check and open the database
CDC
service4.1 Query
CDC
the service status of the database
select is_cdc_enabled from sys.databases where name='DB_Student'
If the query result is 0, it means that the database service has not been started
DB_Student
yetCDC
.4.2 Enable database-level
CDC
functions
ALTER AUTHORIZATION ON DATABASE::[DB_Student] TO [sa];
if exists(select 1 from sys.databases where name='DB_Student' and is_cdc_enabled=0)
begin
exec sys.sp_cdc_enable_db
end
;
select is_cdc_enabled from sys.databases where name='DB_Student';
The result is 0, indicating that the database service has
DB_Student
beenCDC
opened5. Add CDC-specific filegroups and files
SELECT name, physical_name FROM sys.master_files WHERE database_id = DB_ID('DB_Student');
ALTER DATABASE DB_Student ADD FILEGROUP CDC1;
ALTER DATABASE DB_Student
ADD FILE
(
NAME= 'DB_Student_CDC1',
FILENAME = 'D:\DATA\DB_Student_CDC1.ndf'
)
TO FILEGROUP CDC1;
6. The operation opens the table level
CDC
(note: there must be a primary key or a unique index in the table)
SELECT name,is_tracked_by_cdc FROM sys.tables WHERE is_tracked_by_cdc = 0;
IF EXISTS(SELECT 1 FROM sys.tables WHERE name='tb_teacher' AND is_tracked_by_cdc = 0)
BEGIN
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'tb_teacher',
@capture_instance = NULL,
@supports_net_changes = 1,
@role_name = NULL,
@index_name = NULL,
@captured_column_list = NULL,
@filegroup_name = 'CDC1'
END;
DECLARE @tableName nvarchar(36)
DECLARE My_Cursor CURSOR
FOR (SELECT 'new_srv_workorderBase' name
union select 'tablename1'
union select 'tablename2'
union select 'tablename3'
)
OPEN My_Cursor;
FETCH NEXT FROM My_Cursor INTO @tableName;
WHILE @@FETCH_STATUS = 0
BEGIN
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = @tableName,
@capture_instance = NULL,
@supports_net_changes = 1,
@role_name = NULL,
@index_name = NULL,
@captured_column_list = NULL,
@filegroup_name = 'CDC1'
FETCH NEXT FROM My_Cursor INTO @tableName;
END
CLOSE My_Cursor;
DEALLOCATE My_Cursor;
SELECT name,is_tracked_by_cdc FROM sys.tables WHERE is_tracked_by_cdc = 1 ORDER BY NAME;
7. Verify whether it is successfully opened
CDC
7.1 Check
tb_teacher
the openCDC
status of the table and the result is 1, indicating thatCDC
the service has been successful.
select name, is_tracked_by_cdc from sys.tables where object_id = OBJECT_ID('DB_Student')
7.2 After successfully opening the database
DB_Student
serviceCDC
, there are jobs and jobsSQL Server
at the agent-jobcdc.DB_Student_capture
cdc.DB_Student_cleanup
7.3 After the post-service is successfully opened
CDC
, the following functions will be generated in the database "DB_Student
-programmable-function-table-valued function".
cdc.fn_cdc_get_all_changes_dbo_tb_teacher
LSN
: Returns a row for each change applied to the source table within the specified log sequence number ( ) range. If the source row had multiple changes within that interval, each change will be represented in the returned result set.
cdc.fn_cdc_get_net_changes_dbo_tb_teacher
: Returns a net changed row for each changed source row in the specifiedLSN
range . That is, if the source row has multiple changes withinLSN
the range , the function will return a single row that reflects the final contents of that row. : Returns the log sequence number ( ) value in the column of the system
sys.fn_cdc_map_time_to_lsn
table for the specified time . This function can be used to systematically map a datetime range to a range based on the changedata capture enumeration functions and to return data changes within this range.cdc.lsn_time_mapping
start_lsn
LSN
LSN
cdc.fn_cdc_get_all_changes_<capture_instance>
cdc.fn_cdc_get_net_changes_<capture_instance>
7.4 After successfully starting the CDC service, the following table will be generated in the database "DB_Student-table-system table".
cdc.change_tables
: After the table is openedcdc
, a piece of data will be inserted into this table, and some basic information of the table will be recorded
cdc.captured_columns
:cdc
After the table is opened, their field information will be recorded in this table
cdc.dbo_VW_GHZDK_CT
:VW_GHZDK
all changed data in the table will be recorded, and the field "__$operation
" is "1" means deleted , "2" stands for insert, "3" is the value before performing the update operation, and "4" is the value after performing the update operation. Since the change in the field "__$start_lsn
" is from the transaction log of the database, the start sequence number ( ) of the transaction log will be saved hereLSN
.8. Verify
CDC
functionality to implement data change capture8.1 Query system tables
cdc.dbo_tb_teacher_CT
SELECT * FROM cdc.dbo_tb_teacher_CT
From the query results, we can see that
cdc.dbo_tb_teacher_CT
there is no record in the system table. Because the table has just been created, nodbo.tb_teacher
additions, deletions, or changes have been made to the original table.8.2
tb_teacher
Insert records into the table and query againcdc.dbo_tb_teacher_CT
After inserting data into the table , querytb_teacher
the system table , and you can see that there is one more record.cdc.dbo_tb_teacher_CT
Combined with 7.4, you can knowtb_teacher
to insert and update a data record to the data table.