Hudi data lake technology leads the new outlet of big data (2) compilation and installation

Chapter 2 Compile and Install

2.1 Compilation environment preparation

The relevant component versions for this tutorial are as follows:

Hadoop 3.1.3
Hive 3.1.2
Considerable 1.13.6,scala-2.12
Spark 3.2.2,scala-2.12

(1) Install Maven

(1) Upload apache-maven-3.6.1-bin.tar.gz to the /opt/software directory, and decompress and rename

tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /opt/module/

mv apache-maven-3.6.1 maven-3.6.1

(2) Add environment variables to /etc/profile

sudo vim /etc/profile

#MAVEN_HOME

export MAVEN_HOME=/opt/module/maven-3.6.1

export PATH= P A T H : PATH: PATH:MAVEN_HOME/bin

(3) Test installation results

source /etc/profile

mvn -v

(2) Modify to Ali image

(1) Modify setting.xml and specify it as the Ali warehouse address

vim /opt/module/maven-3.6.1/conf/settings.xml

<!-- 添加阿里云镜像-->

<mirror>

​    <id>nexus-aliyun</id>

​    <mirrorOf>central</mirrorOf>

​    <name>Nexus aliyun</name>

​    <url>http://maven.aliyun.com/nexus/content/groups/public</url>

</mirror>

2.2 Compile Hudi

2.2.1 Upload source package

Upload hudi-0.12.0.src.tgz to /opt/software and decompress it

tar -zxvf /opt/software/hudi-0.12.0.src.tgz -C /opt/software

It can also be downloaded from github: https://github.com/apache/hudi/

2.2.2 Modify the pom file

vim /opt/software/hudi-0.12.0/pom.xml

(1()) Add repository to accelerate dependency download

<repository>

​    <id>nexus-aliyun</id>

​    <name>nexus-aliyun</name>

​    <url>http://maven.aliyun.com/nexus/content/groups/public/</url>

​    <releases>

​      <enabled>true</enabled>

​    </releases>

​    <snapshots>

​      <enabled>false</enabled>

​    </snapshots>

  </repository>

(2) Modify the dependent component version

<hadoop.version>3.1.3</hadoop.version>

<hive.version>3.1.2</hive.version>

img

2.2.3 Modify the source code to be compatible with hadoop3

Hudi relies on hadoop2 by default. To be compatible with hadoop3, in addition to modifying the version, the following code needs to be modified:

vim /opt/software/hudi-0.12.0/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java

Modify line 110, originally there was only one parameter, and add the second parameter null:

img

​ Otherwise, due to compatibility issues between hadoop 2.x and 3.x versions, the error is as follows:

img

2.2.4 Manually install Kafka dependencies

There are several kafka dependencies that need to be installed manually, otherwise the compilation error is as follows:

img

(1) Download the jar package

Download via URL: http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip

After decompression, find the following jar package and upload it to the server hadoop1

Ø common-config-5.3.4.jar

Ø common-utils-5.3.4.jar

Ø kafka-avro-serializer-5.3.4.jar

Ø kafka-schema-registry-client-5.3.4.jar

(2) install to maven local warehouse

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar

Guess you like

Origin blog.csdn.net/xianyu120/article/details/131910383