Build highly available flink JobManager HA

  JobManager flink coordination deploy each application, which is responsible for performing regular tasks and resource management.

  Each cluster has a jobManager Flink, after jobManager if there is a problem, will not be able to submit new tasks and run a new task fails, this will result in a single point of failure, so it is necessary to build JobMangager highly available.

  After similar zookeeper, as well jobManager build highly available, if one of the following problems, other available jobManager will take over the task, it becomes leader. The task will not cause flink failed. JobManager can be built in stand-alone and cluster version.

  The following start building a stand-alone version of JobManger HA HA flink version.

  First, we need to set free SSH login secret, because the program will start when the program starts and telnet access.

  Execute commands, you can log in secret in order to avoid their own machine. Without free confidential login, then start hadoop will be reported when the "start port 22 connection refused".

ssh-keygen -t rsa

ssh-copy-id -i ~/.ssh/id_rsa.pub huangqingshi@localhost

  Then download the official website of hadoop binary file, and then began to unpack, I downloaded version hadoop-3.1.3 version. Hadoop installation purposes is to use JobManager hadoop flink highly available storage metadata information.

  I installed hadoop stand-alone version, you can build hadoop cluster version. Next were hadoop configuration.

  Configuring etc / hadoop / coresite.xml, mailing address hdfs protocol specified namenode file system and temporary files directory.

<configuration>
    <property>
        <! - namenode communication addresses specified file system protocol hdfs ->
        <name>fs.defaultFS</name>
        <value>hdfs://127.0.0.1:9000</value>
    </property>
    <property>
        <! - Specifies the hadoop cluster to store temporary files directory ->
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop/tmp</value>
    </property>
</configuration>

  Configuring etc / hadoop / hdfs-site.xml, the metadata storage location is provided, where the data block is known, DFS listening port.

<configuration>
    <property>
        <! - namenode node data storage location (i.e., metadata) can specify multiple directories achieve fault tolerance, multiple directories separated by commas ->
        <name>dfs.namenode.name.dir</name>
        <value>/tmp/hadoop/namenode/data</value>
    </property>
    <property>
        <-! Datanode node data (i.e., data blocks) in the storage position ->
        <name>dfs.datanode.data.dir</name>
        <value>/tmp/hadoop/datanode/data</value>
    </property>
    <property>
        <-! Manually set the listening port DFS ->
        <name>dfs.http.address</name>
        <value>127.0.0.1:50070</value>
    </property>
</configuration>

  Configuring etc / hadoop / yarn-site.xml, ancillary services configured to run on NodeManager and resourceManager host name.

<configuration>

<-! Site specific YARN the Configuration the Properties -> 
    < Property > 
        <-! Ancillary services running on the configuration NodeManger. Needs to be configured to run after mapreduce_shuffle MapReduce program on the Yarn -> 
        < name > yarn.nodemanager.aux-Services </ name > 
        < value > mapreduce_shuffle </ value > 
    </ Property > 
    < Property > 
        <-! ResourceManager hostname -> 
        < name > yarn.resourcemanager.hostname </ name > 
        <localhost</value>
    </property>
</configuration>

  Configuring etc / hadoop / mapred-site.xml, mapreduce specified job run on yarn.

    <property>
        <!--指定mapreduce作业运行在yarn上-->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

  NameNode need to perform the format operation is not performed directly start will be reported "NameNode is not formatted.".

bin / hdfs namenode -format

  Next start hadoop, if successful, can access the following URL:

  http://localhost:50070/

  

 

 

   http://localhost:8088/ 查看构成cluster的节点

 

   http://localhost:8042/node 查看node的相关信息。

 

 

  以上说明hadoop单机版搭建完成。  

  接下来需要下载一个flink的hadoop插件,要不然flink启动的时候会报错的。

  下载地址为:https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-7.0/flink-shaded-hadoop-2-uber-2.8.3-7.0.jar

  把下载的插件放到flink文件的lib文件夹中。

  配置一下flink文件夹的conf/flink-conf.yaml。指定HA高可用模式为zookeeper,元数据存储路径用于恢复master,zookeeper用于flink的 checkpoint 以及 leader 选举。最后一条为zookeeper单机或集群的地址。

high-availability: zookeeper
high-availability.storageDir: hdfs://127.0.0.1:9000/flink/ha
high-availability.zookeeper.quorum: localhost:2181

  其他的采用默认配置,比如JobManager的最大堆内存为1G,每一个TaskManager提供一个task slot,执行串行的任务。

  接下来配置flink的 conf/masters 用于启动两个主节点JobManager。

localhost:8081
localhost:8082

  配置flink的 conf/slaver 用于配置三个从节点TaskManager。

localhost
localhost
localhost

  进入zookeeper路径并且启动zookeeper

bin/zkServer.sh start

  进入flink路径并启动flink。  

bin/start-cluster.sh conf/flink-conf.yaml

  启动截图说明启动了两个节点的HA集群。 

 

  执行jps,两个JobManager节点和三个TaskManager节点:

  

 

 

 

 

   浏览器访问 http://localhost:8081 和 http://localhost:8082,查看里边的日志,搜索granted leadership的说明是主JobManager,如下图。8082端口说明为主JobMaster

 

   一个JobManager, 里边有三个TaskManager,两个JobManager共享这三个TaskManager:

 

   接下来我们来验证一下集群的HA功能,我们已经知道8082为主JobManager,然后我们找到它的PID,使用如下命令:

ps -ef | grep StandaloneSession

  

 

   我们将其kill掉,执行命令kill -9 51963,此时在访问localhost:8082 就不能访问了。localhost:8081 还可以访问,还可以提供服务。接下来咱们重新 启动flink的JobManager 8082 端口。

bin/jobmanager.sh start localhost 8082

  此时8081已经成为leader了,继续提供高可用的HA了。

 

   好了,到此就算搭建完成了。

      

  

Guess you like

Origin www.cnblogs.com/huangqingshi/p/12116686.html