Recently, the development of big data server projects requires the springboot framework to directly query the HDFS data source of Hadoop through the Impala query tool. At the same time, it also needs to query the big data aggregation mysql data, and it is necessary to implement dual data sources.
So after researching, under the springboot framework, the impala query tool is connected through the java language to realize the data warehouse data query.
The whole process is divided into two parts, the first part: springboot integrates mybatis+impala. The second part, realize the dual data source of mysql and impala
The first part: The detailed process of springboot integrating mybatis will not be repeated. The following operations are all done under the condition that the mybatis environment has been configured .
1. Introduce the impala driver jar package. The jar package is not in the maven warehouse and needs to be downloaded by yourself. The download address link: https://pan.baidu.com/s/1wlLsrDvZliuwL_qGtND7nw?pwd=sivw
Extraction code: sivw
2. After downloading, it is recommended to push to the private server warehouse. Push method:
mvn deploy:deploy-file -DgroupId=com.cloudera -DartifactId=impala-jdbc41 -Dversion=2.6.3-SNAPSHOT -Dpackaging=jar -Dfile=F:\Repository\impala\impala-jdbc41\2.6.3\ImpalaJDBC41.jar -Durl=http://xx.xx.xx.xx:8081/repository/maven-snapshots/ -DrepositoryId=snapshots
3. After the push is completed, the project can be referenced. The reference method is as follows:
<dependency> <groupId>com.cloudera</groupId> <artifactId>impala-jdbc41</artifactId> <version>2.6.3-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> <exclusions> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <artifactId>geronimo-jaspic_1.0_spec</artifactId> <groupId>org.apache.geronimo.specs</groupId> </exclusion> <exclusion> <groupId>org.eclipse.jetty.aggregate</groupId> <artifactId>jetty-all</artifactId> </exclusion> </exclusions> </dependency>
Above, because of the jar package dependency of hive-jdbc, multiple jar package conflicts are caused, and the jar package conflicts cause the service to fail to start, so the jar package exclusion is added under hive.
4. Configure the data source
spring: application: name: xxxx main: #Multiple FeignClients correspond to a value allow-bean-definition-overriding: true cloud: nacos: config: server-addr: xxxxxxxxxx:8848 namespace: c00000-c02d-48bb-9e18-a1300000e63 group : DEFAULT_GROUP file-extension: yml shared-configs[0]: data-id: xx-xx-common.yml group: DEFAULT_GROUP refresh: true datasource: dynamic: primary: mysqlDataSource #Set the default data source or data source group, default The value is master strict: false #Set strict mode, the default is false and does not start. After starting, it does not match the specified data source datasource: mysqlDataSource: username: xxxxx password: xxxxx url: jdbc:mysql://xx.xx.xx.xxx:3306/test?allowMultiQueries=true driver-class-name: com.mysql.jdbc.Driver type: com.alibaba. druid.pool.DruidDataSource impalaDataSource: url: jdbc:impala://xx.xx.xx.xx:21050/test driver-class-name: com.cloudera.impala.jdbc41.Driver type: com.alibaba.druid.pool .DruidDataSource ########## Connection pool configuration ########## druid: # Configure initialization size, minimum and maximum initial-size: 5 minIdle: 5 max-active: 16 # Configuration Get the connection waiting timeout time (unit: milliseconds) max- wait: 60000 # Configure how long it takes to detect idle connections that need to be closed, in milliseconds time-between-eviction-runs-millis: 2000 # Configure the minimum survival time of a connection in the pool, in milliseconds min-evictable-idle-time-millis: 600000 max-evictable-idle-time-millis: 900000 # The SQL statement used to test whether the connection is available, the default value is different for each database, this is mysql validationQuery: select 1 # When the application applies for a connection from the connection pool, and testOnBorrow is false, the connection pool will determine whether the connection is idle, and if so, verify whether the connection is available testWhileIdle: true # If true, the default is false, the application sends to the connection pool When applying for a connection, the connection pool will judge whether the connection is available testOnBorrow: false # If true (default false), when the application finishes using the connection and the connection pool recycles the connection, it will judge whether the connection is still available testOnReturn: false # Whether to cache preparedStatement, that is, PSCache. PSCache greatly improves the performance of databases that support cursors, such as oracle poolPreparedStatements: true # To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there will be no memory occupied by PSCache under Oracle If there are too many problems, you can configure this value to be larger, for example, 100 maxOpenPreparedStatements: 20 maxPoolPreparedStatementPerConnectionSize: 20 useGlobalDataSourceStat: true connectionProperties: druid.stat.mergeSql=true;druid.stat.slowSqlMillis=500 # Within the number of minIdle in the connection pool connection, if the idle time exceeds minEvictableIdleTimeMillis, the keepAlive operation will be executed keepAlive: true
5. After the above four steps are completed, you can use the automatically generated code in my other articles to conduct a test.
Part 2: From single data source to impala and mysql dual data source
1. Introduce the dynamic data source jar package
<dependency> <groupId>com.baomidou</groupId> <artifactId>dynamic-datasource-spring-boot-starter</artifactId> <version>3.4.0</version> </dependency>
2. Configure multiple data sources in yml
spring: application: name: xx-xx main: #Multiple FeignClients correspond to a value allow-bean-definition-overriding: true cloud: nacos: config: server-addr: nacos-headless.ms-nacos.svc.cluster.local :8848 namespace: c693d7a3-c02d-48bb-9e18-a1331f81fe63 group: DEFAULT_GROUP file-extension: yml shared-configs[0]: data-id: xx-xx-xx.yml -------Imported public configuration group: DEFAULT_GROUP refresh: true datasource: dynamic: primary: mysqlDataSource #Set the default data source or data source group, the default value is master strict: false #Set strict mode, default false does not start. After startup, no match to the specified data source datasource: mysqlDataSource: username: xxxx password: xxxxxx url: jdbc:mysql://xx.xx.xx.xx:3306/xx?allowMultiQueries=true driver-class-name: com.mysql.jdbc.Driver type: com.alibaba. druid.pool.DruidDataSource impalaDataSource: url: jdbc:impala://xx.xx.xx.xx:21050/test driver-class-name: com.cloudera.impala.jdbc41.Driver type: com.alibaba.druid.pool .DruidDataSource ########## Connection pool configuration ########## druid: # Configure initialization size, minimum and maximum initial-size: 5 # Configure to wait for connection Timeout time (unit: millisecond) max-wait: 60000 minIdle: 5 max-active: 16 # Configure how long to perform a detection, and detect idle connections that need to be closed, in milliseconds time-between-eviction-runs-millis: 2000 # Configure the minimum survival time of a connection in the pool, in milliseconds min-evictable-idle-time-millis: 600000 max-evictable-idle-time-millis: 900000 # The SQL statement used to test whether the connection is available, the default value is different for each database, this is mysql validationQuery: select 1 # When the application applies for a connection from the connection pool, and testOnBorrow is false, the connection pool will determine whether the connection is idle, and if so, verify whether the connection is available testWhileIdle: true # If true, the default is false, the application sends to the connection pool When applying for a connection, the connection pool will judge whether the connection is available testOnBorrow: false # If true (default false), when the application finishes using the connection and the connection pool recycles the connection, it will judge whether the connection is still available testOnReturn: false # Whether to cache preparedStatement, which is PSCache. PSCache greatly improves the performance of databases that support cursors. For example, oracle poolPreparedStatements: true # To enable PSCache, it must be configured to be greater than 0. When it is greater than 0, poolPreparedStatements will be automatically triggered and changed to true. In Druid, there will be no problem of PSCache occupying too much memory under Oracle. You can configure this value to be larger. For example, 100 maxOpenPreparedStatements: 20 maxPoolPreparedStatementPerConnectionSize: 20 useGlobalDataSourceStat: true connectionProperties: druid.stat.mergeSql=true;druid.stat.slowSqlMillis=500 # Connections within the minIdle number in the connection pool, if the idle time exceeds minEvictableIdleTimeMillis, the keepAlive operation will be executed keepAlive: true
3. The service layer calls the persistence layer for processing
According to the unreachable data source, use it on the mapper interface file
@DS("impalaDataSource") to configure which data source to use, if not annotated, the default data source will be used.