MySQL batch data insertion is very slow (kettle input and output components) performance optimization method

Background
I recently worked on a data warehouse refactoring project and encountered some performance bottlenecks. Here are some solutions.

As business data increases every day, the etl tasks developed a few years ago began to fail. Large tables are generally inserted incrementally, but to fix bugs or run once a month/quarter, the full amount needs to be run. The original etl task may need to run for several hours, and even timeout failures may occur. Therefore, optimization is needed, and some optimization methods are introduced below. (The project is made with kettle, if you use other development tools, you can also refer to the following ideas)

1. Configure database connection parameters
2. Remove the primary key of the temporary table DDL
3. Adjust the number of output components
4. Temporarily close the index

Optimization

1. Configure the database connection parameter
defaultFetchSize: 5000
useCursorFetch: true is equivalent to telling the database to read data in batches, and pack 5000 pieces each time and come back
rewriteBatchedStatements: true When inserting data, insert batches
useServerPrepStmts: true Start precompilation
useCompression: true Client Data compression transmission between server and server

Taking kettle as an example, the configuration method is as follows:
insert image description here
Test results:
Before configuring parameters: insert image description here
After configuring parameters:
insert image description here
The performance has improved by 80 times!

2. Remove the primary key in the DDL.
In the process of etl, intermediate tables will be used to store some temporary data. These intermediate tables can remove the primary key in the ddl to ensure uniqueness through logic, and only use the primary key in the result table. The primary key will check whether the related fields are duplicated, thus slowing down the insertion speed. (The following case reads and writes many fields, and the table input sql is very complicated, so the insertion is very slow)

The test results are as follows:
insert image description here
After removing the primary key of the target table,
insert image description here
the performance is improved by 20 times!

3. Adjust the number of output components.
If the output components are still slow, you can copy multiple outputs. Specific operation: Right click on the output component - change the number of start copying
insert image description here

4. Temporarily close the index
Maintaining the index data requires a lot of additional overhead, so before inserting the full amount of data, you can close the index first, and then open the index after the insertion.
ALTER TABLE table_name DISABLE KEYS;
ALTER TABLE table_name ENABLE KEYS;
The effect comparison is as follows:
insert image description here
insert image description here
the performance is improved by 4 times!

Guess you like

Origin blog.csdn.net/samur2/article/details/128471609