Springboot integrates mybatis+mysql/impala to achieve dual data sources

Recently, the development of big data server projects requires the springboot framework to directly query the HDFS data source of Hadoop through the Impala query tool. At the same time, it also needs to query the big data aggregation mysql data, and it is necessary to implement dual data sources.

So after researching, under the springboot framework, the impala query tool is connected through the java language to realize the data warehouse data query.

The whole process is divided into two parts, the first part: springboot integrates mybatis+impala. The second part, realize the dual data source of mysql and impala

The first part: The detailed process of springboot integrating mybatis will not be repeated. The following operations are all done under the condition that the mybatis environment has been configured .

1. Introduce the impala driver jar package. The jar package is not in the maven warehouse and needs to be downloaded by yourself. The download address link: https://pan.baidu.com/s/1wlLsrDvZliuwL_qGtND7nw?pwd=sivw

Extraction code: sivw

2. After downloading, it is recommended to push to the private server warehouse. Push method:

mvn deploy:deploy-file -DgroupId=com.cloudera -DartifactId=impala-jdbc41 -Dversion=2.6.3-SNAPSHOT -Dpackaging=jar -Dfile=F:\Repository\impala\impala-jdbc41\2.6.3\ImpalaJDBC41.jar -Durl=http://xx.xx.xx.xx:8081/repository/maven-snapshots/ -DrepositoryId=snapshots

3. After the push is completed, the project can be referenced. The reference method is as follows:

<dependency>
    <groupId>com.cloudera</groupId>
    <artifactId>impala-jdbc41</artifactId>
    <version>2.6.3-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.1</version>
    <exclusions>
        <exclusion>
            <artifactId>slf4j-log4j12</artifactId>
            <groupId>org.slf4j</groupId>
        </exclusion>
        <exclusion>
            <artifactId>servlet-api</artifactId>
            <groupId>javax.servlet</groupId>
        </exclusion>
        <exclusion>
            <artifactId>geronimo-jaspic_1.0_spec</artifactId>
            <groupId>org.apache.geronimo.specs</groupId>
        </exclusion>
        <exclusion>
            <groupId>org.eclipse.jetty.aggregate</groupId>
            <artifactId>jetty-all</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Above, because of the jar package dependency of hive-jdbc, multiple jar package conflicts are caused, and the jar package conflicts cause the service to fail to start, so the jar package exclusion is added under hive.

4. Configure the data source

spring: 
  application: 
    name: xxxx 
  main: 
    #Multiple FeignClients correspond to a value 
    allow-bean-definition-overriding: true 
  cloud: 
    nacos: 
      config: 
        server-addr: xxxxxxxxxx:8848 
        namespace: c00000-c02d-48bb-9e18-a1300000e63 
        group : DEFAULT_GROUP 
        file-extension: yml 
        shared-configs[0]: 
          data-id: xx-xx-common.yml 
          group: DEFAULT_GROUP 
          refresh: true 
  datasource: 
    dynamic: 
      primary: mysqlDataSource #Set the default data source or data source group, default The value is master 
      strict: false #Set strict mode, the default is false and does not start. After starting, it does not match the specified data source 
      datasource:
        mysqlDataSource: 
          username: xxxxx 
          password: xxxxx 
          url: jdbc:mysql://xx.xx.xx.xxx:3306/test?allowMultiQueries=true 
          driver-class-name: com.mysql.jdbc.Driver 
          type: com.alibaba. druid.pool.DruidDataSource 
        impalaDataSource: 
          url: jdbc:impala://xx.xx.xx.xx:21050/test 
          driver-class-name: com.cloudera.impala.jdbc41.Driver 
          type: com.alibaba.druid.pool .DruidDataSource 
        ########## Connection pool configuration ########## 
        druid: 
          # Configure initialization size, minimum and maximum 
          initial-size: 5 
          minIdle: 5 
          max-active: 16  
          # Configuration Get the connection waiting timeout time (unit: milliseconds)
          max- wait: 60000 
          # Configure how long it takes to detect idle connections that need to be closed, in milliseconds 
          time-between-eviction-runs-millis: 2000 
          # Configure the minimum survival time of a connection in the pool, in milliseconds 
          min-evictable-idle-time-millis: 600000 
          max-evictable-idle-time-millis: 900000 
          # The SQL statement used to test whether the connection is available, the default value is different for each database, this is mysql 
          validationQuery: select 1 
          # When the application applies for a connection from the connection pool, and testOnBorrow is false, the connection pool will determine whether the connection is idle, and if so, verify whether the connection is available testWhileIdle: true # If true, the default is false, the application sends to the 
          connection 
          pool When applying for a connection, the connection pool will judge whether the connection is available 
          testOnBorrow: false 
          # If true (default false), when the application finishes using the connection and the connection pool recycles the connection, it will judge whether the connection is still available 
          testOnReturn: false 
          # Whether to cache preparedStatement, that is, PSCache. PSCache greatly improves the performance of databases that support cursors, such as oracle 
          poolPreparedStatements: true 
          # To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there will be no memory occupied by PSCache under Oracle If there are too many problems, you can configure this value to be larger, for example, 100 
          maxOpenPreparedStatements: 20 
          maxPoolPreparedStatementPerConnectionSize: 20 
          useGlobalDataSourceStat: true 
          connectionProperties: druid.stat.mergeSql=true;druid.stat.slowSqlMillis=500 
          # Within the number of minIdle in the connection pool connection, if the idle time exceeds minEvictableIdleTimeMillis, the keepAlive operation will be executed 
          keepAlive: true

5. After the above four steps are completed, you can use the automatically generated code in my other articles to conduct a test.

Part 2: From single data source to impala and mysql dual data source

1. Introduce the dynamic data source jar package

<dependency>
    <groupId>com.baomidou</groupId>
    <artifactId>dynamic-datasource-spring-boot-starter</artifactId>
    <version>3.4.0</version>
</dependency>

2. Configure multiple data sources in yml

spring: 
  application: 
    name: xx-xx 
  main: 
    #Multiple FeignClients correspond to a value 
    allow-bean-definition-overriding: true 
  cloud: 
    nacos: 
      config: 
        server-addr: nacos-headless.ms-nacos.svc.cluster.local :8848 
        namespace: c693d7a3-c02d-48bb-9e18-a1331f81fe63 
        group: DEFAULT_GROUP 
        file-extension: yml 
        shared-configs[0]: 
          data-id: xx-xx-xx.yml -------Imported public configuration 
          group: DEFAULT_GROUP 
          refresh: true 
  datasource: 
    dynamic: 
      primary: mysqlDataSource #Set the default data source or data source group, the default value is master 
      strict: false #Set strict mode, default false does not start. After startup, no match to the specified data source 
      datasource:
        mysqlDataSource: 
          username: xxxx 
          password: xxxxxx 
          url: jdbc:mysql://xx.xx.xx.xx:3306/xx?allowMultiQueries=true 
          driver-class-name: com.mysql.jdbc.Driver 
          type: com.alibaba. druid.pool.DruidDataSource 
        impalaDataSource: 
          url: jdbc:impala://xx.xx.xx.xx:21050/test 
          driver-class-name: com.cloudera.impala.jdbc41.Driver 
          type: com.alibaba.druid.pool .DruidDataSource 
        ########## Connection pool configuration ########## 
        druid: 
          # Configure initialization size, minimum and maximum 
          initial-size: 5 
          # Configure to wait for connection Timeout time (unit: millisecond) 
          max-wait: 60000 
          minIdle: 5
          max-active: 16 
          # Configure how long to perform a detection, and detect idle connections that need to be closed, in milliseconds 
          time-between-eviction-runs-millis: 2000 
          # Configure the minimum survival time of a connection in the pool, in milliseconds 
          min-evictable-idle-time-millis: 600000 
          max-evictable-idle-time-millis: 900000 
          # The SQL statement used to test whether the connection is available, the default value is different for each database, this is mysql 
          validationQuery: select 1 
          # When the application applies for a connection from the connection pool, and testOnBorrow is false, the connection pool will determine whether the connection is idle, and if so, verify whether the connection is available testWhileIdle: true # If true, the default is false, the application sends to the 
          connection 
          pool When applying for a connection, the connection pool will judge whether the connection is available 
          testOnBorrow: false 
          # If true (default false), when the application finishes using the connection and the connection pool recycles the connection, it will judge whether the connection is still available 
          testOnReturn: false
          # Whether to cache preparedStatement, which is PSCache. PSCache greatly improves the performance of databases that support cursors. For example, oracle
          poolPreparedStatements: true 
          # To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there will be no problem of PSCache occupying too much memory under Oracle. You can configure this value to be larger. For example, 100 
          maxOpenPreparedStatements: 20 
          maxPoolPreparedStatementPerConnectionSize: 20 
          useGlobalDataSourceStat: true 
          connectionProperties: druid.stat.mergeSql=true;druid.stat.slowSqlMillis=500 
          # Connections within the minIdle number in the connection pool, if the idle time exceeds minEvictableIdleTimeMillis, the keepAlive operation will be executed 
          keepAlive: true

 

3. The service layer calls the persistence layer for processing

According to the unreachable data source, use it on the mapper interface file

@DS("impalaDataSource") to configure which data source to use, if not annotated, the default data source will be used.

Guess you like

Origin blog.csdn.net/weixin_48363639/article/details/124350891