Data extraction for big data processing

learning target:

Data extraction methods and implementation methods

Learning Content:

  1. Data extraction methods: full extraction and incremental extraction

2. Data loading method: full table delete insert method, trigger method

study-time:

If you have sql foundation, 6 hours


Learning Outputs:

1. A piece of technical notes
2. A set of practice questions, including the source code of the
answer

(1) Full extraction

Full extraction is similar to data migration or data replication. It extracts the data of the table in the data source from the database intact and converts it into a format that can be recognized by its own ETL tool. Full extraction is relatively simple.

(2) Incremental extraction

Incremental extraction only extracts new or modified data in the tables to be extracted in the database since the last extraction. During the use of ETL. Incremental extraction is more widely used than full extraction. How to capture changing data is the key to incremental extraction. There are generally two requirements for the capture method: accuracy, which can accurately capture the changing data in the business system at a certain frequency; and can not cause too much pressure on the business system and affect the existing business.

Two data loading methods

1. The full table delete and insert method refers to deleting the data in the target table before each extraction, and loading the data newly when extracting. This method actually equates incremental extraction to full extraction. This method can be used when the amount of data is not large and the time cost of full extraction is less than the algorithm and conditional cost of incremental extraction

2. Trigger method
The trigger method is a commonly used incremental extraction mechanism. According to the extraction requirements, three triggers are established on the source table to be extracted: insert, modify, and delete. Whenever the data in the source table changes, the corresponding trigger will write the changed data into an incremental The incremental extraction of ETL is to extract data from the incremental log table instead of directly from the source table, and the extracted data in the incremental log table must be marked or deleted in time.
3. For the partition table, only extract the content of one partition.

The example extracts data from the data download platform and another database to its own database.

Import the user testkz table PK_FBK_OPEN data into the target database user sys PK_FBK_OPEN

1. Create a table in user testkz, sys

create table PK_FBK_OPEN
(
obk_no VARCHAR2(14),
op_date NUMBER(6),
op_inst VARCHAR2(8),
paper_no VARCHAR2(18),
paper_type VARCHAR2(2)

)
to insert data

insert into PK_FBK_OPEN
select ‘130001’,‘201906’,‘121205’,‘13010519820605’,‘01’ from dual

insert into PK_FBK_OPEN

select ‘130002’,‘201908’,‘121206’,‘130105198207605’,‘02’ from dual;

Commit;

2. Establish database chain
CREATE DATABASE LINK db_testkz
CONNECT TO testkz IDENTIFIED BY haien
USING 'orcl';

3 Create a configuration table

create table LOAD_

Guess you like

Origin blog.csdn.net/qq_22201881/article/details/125454951