Flink deployment mode survey (you will know which kind of company your company is suitable for after reading it)

1. Three modes of research:

1、Flink On Yarn

(1) PerJob mode

One cluster per task

(2) Session mode

One session for multiple tasks can start multiple sessions

2. Standlone mode

Multiple tasks in a cluster, multiple clusters can be started, and multiple taskmanagers can be created on each node

2. Comparison of different modes

perjob_yarn session_yarn standlone_cluster Dominate
Cluster One cluster per task One cluster for multiple tasks can have multiple clusters A cluster perjob_yarn > session_yarn > standlone_cluster****session_yarn is more beautiful than perjob and more elegant than standlone to start multiple clusters
Allocate tm/jm memory, slot Free configuration Limited when starting a session Limited when starting the cluster perjob_yarn > session_yarn = standlone_cluster
Start task Separate start Submit the task to the specified session Submit tasks to the cluster Flat
Mission recovery Separate start A single task is started separately when a problem occurs; session hangs and all tasks restart A single task is started separately when a problem occurs; the cluster hangs and requires all tasks to be restarted perjob_yarn > session_yarn = standlone_cluster
Recovery time Independent recovery, fast 1. A single task fails to be restored separately; 2. The recovery time depends on the script when the session is hung up or a task affects the session . Once the script is abnormal, manual single recovery takes a long time; 3. The current stability is not high, and we currently configure it Highly available, it seems not suitable for session 1. A single task fails to recover separately; 2. All tasks are started when the cluster is suspended 3. High availability perjob_yarn > session_yarn = standlone_cluster
Recovery granularity One One / many One / many perjob_yarn > session_yarn = standlone_cluster
Log Separate log All tasks started in the session are viewed in a jobmanager log. Yarn will help you collect jobmanager and taskmanager logs All tasks started in the cluster are viewed in the flink-root-standalonesession-.log log in the master node. If you want to see the task log, you need to go to the corresponding worker node to see it, which is not convenient perjob_yarn > session_yarn > standlone_cluster
monitor yarn + flink rest api sessionId + flink rest api jobid **On yarn ** is better than standlone_cluster

note:

  • The session mode is based on a bug that I mentioned in jira to distinguish the logs of different tasks. According to the respondents, some attempts have failed. There is no other effective way for the time being.
    For details, see Jira's FLINK-19768 transmission link .

  • Perjob and session modes need to rely on Yarn, so other components are needed to monitor. In addition to this one drawback, On Yarn is recommended.

Guess you like

Origin blog.csdn.net/weixin_44500374/article/details/112612031