Web version ETL function development thinking record (data extraction) (2)

        Continuing from the previous article, I took the time to write some in January. First of all, the system is temporarily preset not to consider metadata management and control, and does not consider individual management and control of database data. It only processes table-level data in different types of libraries. As recorded last time, basically every database has a data loader similar to a data pump. The loading efficiency of this loader is much higher than the INSERT INTO statement. It is best to use this method, but INSERT should also be provided. INTO make choices this way.

        1. Database, data source.

        First of all, in order to ensure that a large number of databases and multiple types of databases can be implemented, the IP, password, and account number of the database are all stored in the table. Considering that those who operate this database do not necessarily have administrator rights, the library resources and specific data resources should be distinguished. Set up two tables, one table stores the IP, port, type, and version of the database, and the other table stores the data resources under the corresponding database table, such as account number, password, service name, etc., so that one database table can be associated with multiple Resource information can support multiple accounts and multiple resource tables under one database.

         ​ ​ 2. Support dynamic replacement of driver packages caused by different versions during JDBC connection

         Students who have used JDBC should know that when creating a connection with JDBC, you need to use

Class.forName(driver);

          This form of code is used to select the driver. For example, like Dameng database, you need to fill in "dm.jdbc.driver.DmDriver" in Class.forName(); to select the driver of Dameng database, but this involves a Problems like mysql database connection. When mysql6.0 is below, the connection driver should be com.mysql.jdbc.Driver, but when mysql6.0 is above, you need to choose com.mysql.cj.jdbc.Driver. If you choose the wrong one, you will not be able to connect or If an error is reported, a dynamic configuration version and driver association are needed. After solving this problem, there is a new problem, that is, URL. Students who use both high and low versions of MYSQL may have a memory, that is, the lower version seems to be A simple IP+PORT+service name can connect, but higher versions also require time zone, connection mode, etc., for example:

jdbc:mysql://[ip]:[port]/[server]?useUnicode=true&characterEncoding=utf8  --低版本

jdbc:mysql://[ip]:[port]/[server]?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=true&serverTimezone=GMT%2B8   --高版本

Therefore, in order to be as convenient as possible, we added the configuration of the URL template to the database, and singled out the frequently changing parameters such as IP, PORT, and SERVER as modifiable parameters, and used regularization or interception to achieve dynamic splicing.

         3. Field conversion configuration

         At present, there are at least 10 common relational databases that I personally know of. Not to mention other libraries, the two most commonly used libraries are MYSQL and ORACLE. MYSQL uses INT to represent integer data, but in ORACLE, both If you use NUMBER to represent a number, you need a configuration item to express the comparison and conversion relationship between the source table field and the target table field.

          4. Process design

          Based on the above thinking, to perform an extraction, what we need to do should be:

  1. Configure database information
  2. Configure the data source information of the corresponding database
  3. Configure the main table of the data conversion relationship table, select the source and target versions, etc.
  4. Configure the conversion relationship to indicate the field correspondence of the detailed table
  5. Configure the extraction task, select the extraction source table and target table, and select the task type (1. Directly execute SQL to the target table, 2. Assume that the target resource has a table, and perform data transfer, 3. Assume that the target resource has no corresponding table, and execute the source and target table. Analysis, target resource table creation, data transfer)
  6. Generate log file

Guess you like

Origin blog.csdn.net/Zachariahs/article/details/107740574