Scala connects Mysql database and Sqlserver database, and incrementally extracts data and stores it in Hive database

Scala connects Mysql database and Sqlserver database


The single table data volume of Mysql and Sqlserver source databases exceeds 200G. Now it is necessary to transport the data to HDFS for storage to free up the source database storage space. Here we use Scala to develop the Spark program, extract data incrementally according to the index ID and insert it into the hive database. It is planned to extract 3 million pieces of data each time incrementally, and store the largest ID in a record table each time. When extracting the next time, first obtain the largest ID in the record table as the starting ID for data extraction. The starting ID plus 3 million is compared with the largest ID in the source database table. If it is less than the largest ID in the source database table, the start The ID plus 3 million is used as the data extraction end ID. If it is greater than the largest ID in the source database table, the largest ID in the source database table is taken as the data extraction end ID.

As shown in the figure below:
3 million pieces of data are extracted each time, and the maximum ID is stored in a record table each time, and 3 million pieces of data are extracted to the Hive database table in each increment based on the maximum ID.
Insert picture description here

The following details the code for Scala to connect to Mysql database and Sqlserver database, and incrementally extract data and store it to Hive database. Connecting to Mysql database takes a single thread to extract data and connect to Sql

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/111314731