[FLINK] FLIP-6 on YARN

YARN TaskManager Runner FLINK-4929

The YARN TaskManager Runner has the following responsibilities:

  • Read the configuration and all environment variables and compute the effective configuration
  • Start all services (Rpc, High Availability, Security, etc)
  • Instantiate and start the Task Manager Runner

YARN Application Master Runner FLINK-4928

The Application Master Runner is the master process started in a YARN container when submitting the Flink-on-YARN job to YARN.

It has the following data available:

  • Flink jars
  • Job jars
  • JobGraph
  • Environment variables
  • Contextual information like security tokens and certificates

Its responsibility is the following:

  • Read all configuration and environment variables, computing the effective configuration
  • Start all shared components (Rpc, HighAvailability Services)
  • Start the ResourceManager
  • Start the JobManager Runner

YARN Resource Manager FLINK-4927

The Flink YARN Resource Manager communicates with YARN's Resource Manager to acquire and release containers.

It is also responsible to notify the JobManager eagerly about container failures.

YARN Client FLINK-4930

The FLIP-6 YARN client can follow parts of the existing YARN client.

The main difference is that it does not wait for the cluster to be fully started and for all TaskManagers to register. It simply submits

  • Set up all configurations and environment variables
  • Set up the resources: Flink jar, utility jars (logging), user jar
  • Set up attached tokens / certificates
  • Submit the Yarn application
  • Listen for leader / attempt to connect to the JobManager to subscribe to updates
  • Integration with the Flink CLI (command line interface)

Yarn HighAvailability Services FLINK-5254

The Yarn HighAvailability Services should be

Default

  • This option takes the YARN Application's working directory as HA storage
  • It automatically uses that working directory for the BlobStore
  • It creates a HDFS based "RunningJobsRegistry" (see below)
  • ResourceManager leader election has a pre-configured leader, via the configuration, pointing to the AppMaster address.

ZooKeeper Based

  • The ZooKeeper based services use ZooKeeper for the ResourceManager and JobManager leader election. That way, they are safe against network partition scenarios that otherwise lead to "split brain" situations

A prototype for the simple "single job" RunningJobsRegistry based on HDFS is here: https://github.com/StephanEwen/incubator-flink/commit/aaa2d7758797b2d6c9b6da42be6a5c4989468e3b

猜你喜欢

转载自www.cnblogs.com/tisonkun/p/10083699.html