Article directory
Chapter 2 Compile and Install
2.1 Compilation environment preparation
The relevant component versions for this tutorial are as follows:
Hadoop | 3.1.3 |
---|---|
Hive | 3.1.2 |
Considerable | 1.13.6,scala-2.12 |
Spark | 3.2.2,scala-2.12 |
(1) Install Maven
(1) Upload apache-maven-3.6.1-bin.tar.gz to the /opt/software directory, and decompress and rename
tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /opt/module/
mv apache-maven-3.6.1 maven-3.6.1
(2) Add environment variables to /etc/profile
sudo vim /etc/profile
#MAVEN_HOME
export MAVEN_HOME=/opt/module/maven-3.6.1
export PATH= P A T H : PATH: PATH:MAVEN_HOME/bin
(3) Test installation results
source /etc/profile
mvn -v
(2) Modify to Ali image
(1) Modify setting.xml and specify it as the Ali warehouse address
vim /opt/module/maven-3.6.1/conf/settings.xml
<!-- 添加阿里云镜像-->
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
2.2 Compile Hudi
2.2.1 Upload source package
Upload hudi-0.12.0.src.tgz to /opt/software and decompress it
tar -zxvf /opt/software/hudi-0.12.0.src.tgz -C /opt/software
It can also be downloaded from github: https://github.com/apache/hudi/
2.2.2 Modify the pom file
vim /opt/software/hudi-0.12.0/pom.xml
(1()) Add repository to accelerate dependency download
<repository>
<id>nexus-aliyun</id>
<name>nexus-aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
(2) Modify the dependent component version
<hadoop.version>3.1.3</hadoop.version>
<hive.version>3.1.2</hive.version>
2.2.3 Modify the source code to be compatible with hadoop3
Hudi relies on hadoop2 by default. To be compatible with hadoop3, in addition to modifying the version, the following code needs to be modified:
vim /opt/software/hudi-0.12.0/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java
Modify line 110, originally there was only one parameter, and add the second parameter null:
Otherwise, due to compatibility issues between hadoop 2.x and 3.x versions, the error is as follows:
2.2.4 Manually install Kafka dependencies
There are several kafka dependencies that need to be installed manually, otherwise the compilation error is as follows:
(1) Download the jar package
Download via URL: http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip
After decompression, find the following jar package and upload it to the server hadoop1
Ø common-config-5.3.4.jar
Ø common-utils-5.3.4.jar
Ø kafka-avro-serializer-5.3.4.jar
Ø kafka-schema-registry-client-5.3.4.jar
(2) install to maven local warehouse
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar