Compilation differences between different versions of Flink

1. Version 1.11.0 and before

The previous method is to compile the flink-shaded-hadoop package first, compile the flink-shaded-hadoop-2-uber_xxx package according to the version of hadoop and hive you specify for your production, and then put the package in the lib directory, and start flink Go to lib to load during the task.

You can refer to two links in this way:

Source address of flink-shade official website: https://github.com/apache/flink-shaded

The shaded package in version 1.10

2. After 1.11.0 version

In order to make Flink Hadoop Free, Flink can now support hadoop2 and hadoop3, and can specify different Hadoop environments.

In order to achieve this goal, just set export HADOOP_CLASSPATH=** hadoop classpath** without compiling the flink-shaded package.
The key point is that the compiled Flink jar does not contain Hadoop and Hive code. When the Flink task is started, both JM and TM obtain Hadoop related variables through the HADOOP_CLASSPATH environment variable.

At first hadoop classpath, Xiao Caiji thought that ** was just a certain path written randomly. Later, thanks to Zha Rui popularizing the knowledge of Xiaobai, `` there is a command, I remember to forget it later, so hadoop classpath** is a command. After execution, you will see the environment variables that hadoop depends on:

[yujianbo@qzcs86 ~]$ hadoop classpath
/etc/hadoop/conf:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop-hdfs/./:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop-hdfs/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop-hdfs/.//:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop-yarn/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/…/…/hadoop-yarn/.//*

For details, please refer to:

There is such a paragraph in this link to explain

Flink will use the environment variable HADOOP_CLASSPATH to augment the classpath that is used when starting Flink components such as the Client, JobManager, or TaskManager. Most Hadoop distributions and cloud environments will not set this variable by default so if the Hadoop classpath should be picked up by Flink the environment variable must be exported on all machines that are running Flink components.
Flink将使用环境变量HADOOP CLASSPATH来扩展启动Flink组件(如客户机、JobManager或TaskManager)时使用的类路径。大多数Hadoop发行版和云环境在默认情况下不会设置这个变量,因此,如果应该由Flink获取Hadoop类路径,则必须在运行Flink组件的所有机器上导出环境变量。 

Guess you like

Origin blog.csdn.net/weixin_44500374/article/details/112611677