Best Practices of Mesos+Zookeeper+Marathon+Docker Distributed Cluster Management

 1.1 Introduction to Mesos

  Mesos is an open source distributed resource management framework under Apache, which is called the kernel of a distributed system. Mesos was originally developed by AMPLab at the University of California, Berkeley, and has since been widely used at Twitter.

  Mesos-Master: Mainly responsible for managing each framework and slave, and assigning resources on the slave to each framework.

  Mesos-Slave: Responsible for managing each mesos-task on this node, for example: allocating resources to each executor.

  Framework: Computing framework, such as: Hadoop, Spark, Kafka, ElasticSerach, etc., access Mesos through MesosSchedulerDiver

  Executor: The executor is the software installed on each machine node. Here, the docker container is used to play the role of the executor. It has the characteristics of fast startup and destruction, high isolation, and consistent environment.

  Mesos-Master is the core of the entire system, responsible for managing the frameworks (managed by frameworks_manager) and slaves (managed by slaves_manager) that access Mesos, and allocates resources on slaves to frameworks according to a certain strategy (by the independent plug-in module Allocator manage).

  Mesos-Slave is responsible for accepting and executing commands from Mesos-master, managing mesos-tasks on nodes, and allocating resources for each task. Mesos-slave sends its own resources to mesos-master, and the Allocator module in mesos-master decides which framework to allocate resources to. Currently, there are two types of resources considered: CPU and memory, that is, Mesos-slave will The memory amount of the number of CPUs is sent to mesos-master, and when users submit jobs, they need to specify the number of CPUs and memory required by each task. In this way: when the task is running, mesos-slave will put the task to run in a Linux container containing fixed resources to achieve the effect of resource isolation. Obviously, the master has a single point of failure problem, for this: Mesos adopts Zookeeper to solve this problem.

  Framework refers to an external computing framework, such as Hadoop, Mesos, etc., these computing frameworks can access Mesos through registration, so that Mesos can perform unified management and resource allocation. Mesos requires that the accessible framework must have a scheduling module, which is responsible for task scheduling within the framework. When a framework wants to access Mesos, it needs to modify its own scheduler in order to register with Mesos and obtain the resources allocated to itself by Mesos, so that its own scheduler allocates these resources to tasks in the framework, that is, Said that, the entire Mesos system adopts a two-layer scheduling framework: the first layer, Mesos allocates resources to the framework. In the second layer, the framework's own scheduler allocates resources to its own internal tasks. Currently Mesos supports schedulers written in three languages, namely C++, Java, and Python. In order to provide a unified access method to various schedulers, Mesos uses C++ to implement a MesosSchedulerDriver (scheduling driver). The scheduler of the framework can call the interface in the driver to interact with Mesos-master to complete a series of functions (such as registration , resource allocation, etc.)

  Executor is mainly used to start tasks inside the framework. Since different frameworks have different interfaces or methods for starting tasks, when a new framework wants to access mesos, an Executor needs to be written to tell Mesos how to start tasks in the framework. In order to provide a unified executor writing method for various frameworks, Mesos internally uses C++ to implement a MesosExecutorDiver (executor driver), and the framework can tell Mesos how to start tasks through the driver's relevant interface.

  The overall architecture is shown in Figure 1.1-1

  

 

  Figure 1.1-1 Overall Architecture

  Figure 1.1-1 shows the important components of Mesos. Mesos is managed by a master process that runs the slave process of each client node and the Mesos computing framework that runs tasks.

  The Mesos process can manage CPU and memory in detail through the computing framework, thereby providing resources. Each resource provider contains a list (slave ID, resource1: amount1, resource2, amount2,...) The master will decide how many resources to provide each computing framework based on the existing resources. Such as fair sharing or sharing based on priority.

  In order to support different kinds of policies, the master adds an allocation module through the plug-in mechanism to make it easier and more convenient to allocate resources.

  A computing framework runs on two components, one is the Scheduler, which is the registration center of the resources provided by the master, and the other is the Executor program, which is used to initiate the task of running the computing framework on the slave node. The master decides how many computing resources to provide to each computing framework, and the scheduling of the computing framework chooses which resources to use. When a computing framework accepts the provided resources, it runs the program through the Mesos task description. Mesos will also initiate tasks on the corresponding slaves.

  Resource provision case, as shown in Figure 1.1-2

  

 

  Figure 1.1-2 Resource provision case

  Next, I will take you to familiarize yourself with the process steps of Figure 1.1-2

  1. Slave1 reports to the master that he has a 4-core CPU and 4G of remaining memory. Marathon calls the allocation policy module to tell slave1 that computing framework 1 should be provided with available resources.

  2. The master sends a description of the resources available on the slave to the computing framework 1.

  3. The scheduler of the computing framework replies to the master with information about two tasks running on the slave. Task 1 needs to use 2 CPUs and 1G of memory, and task 2 needs to use 1 CPU and 2G of memory.

  4. Finally, the master sends the task to the slave, assigns the appropriate one to the computing framework executor, and continues to initiate two tasks (the dotted line in Figure 1.1-2), because there is still 1 CPU and 1G memory unallocated, the allocation module may now provide The remaining resources are given to computing framework 2.

  In addition to this, this resource provider will repeat when the task completes and the new resource becomes free.

  1.2 Introduction to Zookeeper

  Zookeeper is a distributed, open source distributed application coordination service, an open source implementation of Google's Chuby, and an important component of Hadoop and Hbase. It is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, name service, distributed synchronization, group service, etc.

  1.2.1 Zookeeper role

  Leader (Leader): Responsible for voting initiation and resolution, and updating the system status.

  Follower: Follower is used to receive client requests and return results to the client, and participate in voting in the process of electing the leader.

  ObServer (Observer): ObServer can accept client connections and forward write requests to the Leader node, but ObServer does not participate in the voting process and only synchronizes the status of the Leader. The purpose of ObServer is to expand the system and improve the reading speed.

  Client (client): the originator of the request.

  1.2.2 The working principle of Zookeeper

  The core of Zookeeper is atomic broadcast, which ensures synchronization between servers. The protocol that implements this mechanism is called the Zab protocol. There are two modes of Zab protocol, they are recovery mode (select master) and broadcast mode (synchronization). When the service starts or after the leader crashes, Zab enters recovery mode. When the leader is elected and most servers have finished synchronizing with the leader's state, the recovery mode ends. State synchronization ensures that the Leader and Server have the same system state.

  In order to ensure the sequential consistency of things, Zookeeper uses an increasing transaction ID number (zxid) to identify things. All proposals are made with zxid added. In the implementation, zxid is a 64-bit number, and its upper 32 bits are used by epoch to identify whether the leader relationship has changed. Every time a leader is selected, it will have three states of each server in the working process.

  q LOOKING: The current server does not know who the leader is and is searching.

  q LEADING: The current Server is the elected Leader.

  q FOLLOWING: The leader has been elected, and the current server is synchronized with it.

  1.2.3 Zookeeper election process

  When the leader crashes or the leader loses most of the followers, zk enters the recovery mode, and the recovery mode needs to re-elect a new leader to restore all servers to a correct state.

  There are two election algorithms for ZK:

  1. Based on Basic paxos implementation

  2. Implementation based on fast paxos algorithm

  The system default election algorithm is fast paxos.

  1.2.4 Zookeeper synchronization process

  After selecting the leader, zk enters the state synchronization process.

  1) The Leader waits for the server to connect.

  2) The Follower connects to the Leader and sends the largest zxid to the Leader.

  3) The Leader determines the synchronization point according to the zxid of the Follower.

  4) After the synchronization is completed, the follower is notified that it has become the update state.

  5) After the Follower receives the update message, it can re-accept the client's request for service.

  1.2.5 Zookeeper Workflow

  Three functions of Leader:

  1) Recover data

  2) Maintain the heartbeat with the learner, receive the learner request and judge the request message type of the learner

  3) The message types of the learner mainly include PING message, REQUEST message, ACK message, and REVALIDATE message, which are processed differently according to different message types.

  The PING message refers to the learner's heartbeat information; the REQUEST message is the proposal information sent by the follower, including the write request and the synchronization request; the ACK message is the follower's reply to the proposal, and if more than half of the followers pass, the proposal is committed; the REVALIDATE message is used to extend the SESSION valid time.

  Follower has four main functions:

  1) Send a request (PING message, REQUEST message, ACK message, REVALIDATE message) to the Leader.

  2) Receive the Leader message and process it.

  3) Receive the client's request, and if it is a write request, send it to the Leader for voting.

  4) Return the Client result.

  The Follower's message loop processes the following messages from the Leader:

  1) PING message: heartbeat message.

  2) PROPOSAL message: A proposal initiated by the Leader, requiring Followers to vote.

  3) COMMIT message: information about the latest proposal on the server side.

  4) UPTODATE message: indicates that the synchronization is completed.

  5) REVALIDATE message: According to the REVAIDATE result of the Leader, whether to close the session to be revalidated or allow it to accept the message.

  6) SYN message: Return the SYNC result to the client. This message is originally initiated by the client to force the latest update.

  1.3 Introduction to Marathon

  Marathon is a Mesos framework that supports long-running services such as web applications. It is a distributed init.d of the cluster capable of running any Linux binary distribution such as Tomcat, Play, etc. as-is. It is also a private PaSS that implements service discovery, provides REST API services for deployment, has authorization and SSL, configuration constraints, and implements service discovery and load balancing through HaProxy.

  1.4 docker cluster practice

  1.4.1 Cluster Environment Preparation

  Host name IP address (Host-Only) description

  linux-node1.cometh0:192.168.56.11Mesos Master、Mesos Slave、Marathon

  linux-node2.cometh0:192.168.56.12Zookeeper、Mesos Slave

  Note that Zookeeper does not use pseudo-distribution, that is, three Zookeeper instances with different ports are started on a virtual machine.

  Linux-node1 practice environment

  [root@linux-node1 ~]# cat /etc/redhat-release #View system version

  CentOS Linux release 7.1.1503 (Core)

  [root@linux-node1 ~]# uname -r #View kernel information

  3.10.0-229.el7.x86_64

  [root@linux-node1 ~]# getenforce #Check whether selinux is closed

  Permissive

  [root@linux-node1 ~]# systemctl stop firewalld #Close the firewall firewall

  Linux-node2 practice environment

  [root@linux-node2 ~]# cat /etc/redhat-release #View system version

  CentOS Linux release 7.1.1503 (Core)

  [root@linux-node2 ~]# uname -r #View kernel information

  3.10.0-229.el7.x86_64

  [root@linux-node2 ~]# getenforce

  #Check whether selinux is closed

  Permissive

  [root@linux-node2 ~]# systemctl stop firewalld #Close the firewall firewall

  1.4.2 Zookeeper pseudo-cluster installation and deployment

  Deploying Zookeeper requires java support, mainstream 1.7, the latest stable 1.8, and the latest 1.9 for development. Select yum installation here to support Zookeeper

  [root@linux-node2 ~]# yum install -y java #安装java

  [root@linux-node2 ~]# cd /usr/local/src/ #Enter the source installation directory

  [root@linux-node2 src]# wget

  http://mirrors.cnnic.cn/apache/zookeeper/stable/zookeeper-3.4.8.tar.gz #Download Zookeeper stable version 3.4.8

  [root@linux-node2 src]# tar xf zookeeper-3.4.8.tar.gz #解压Zookeeper

  [root@linux-node2 src]# mv zookeeper-3.4.8 /usr/local/ #将Zookeeper移动/usr/local/

  [root@linux-node2 src]# ln -s /usr/local/zookeeper-3.4.8/ /usr/local/zookeeper #Make a soft link for Zookeeper to facilitate future upgrades and other operations

  1.4.2.1 Detailed explanation of Zookeeper configuration file

  Because of the pseudo-cluster configuration, B (IP address) is the same, so the communication port numbers of different Zookeeper instances cannot be the same, so different port numbers must be assigned to it.

  [root@linux-node2 src]# cd /usr/local/zookeeper/conf/ #Enter the Zookeeper configuration file directory

  [root@linux-node2 conf]# mv zoo_sample.cfg zoo.cfg #Rename to zoo.cfg

  [root@linux-node2 conf]# cat zoo.cfg

  tickTime=2000 #The time interval for maintaining heartbeats between Zookeeper servers or between the client and the server, that is, a heartbeat will be sent every tickTime.

  initLimit=10 #Zookeeper's Leader can tolerate up to multiple heartbeat intervals when accepting client (Follower) initialization connections. When the Zookeeper server has not received the return information from the client after more than 5 heartbeats (that is, tickTime), it indicates that the client connection failed. The total time length is 5*2000=10 seconds.

  syncLimit=5 #Indicates the length of the request and response time when sending messages between the Leader and the Follower. The longest time length cannot exceed the length of several tickTimes. The total time length is 2*2000=4 seconds.

  dataDir=/tmp/zookeeper #Data storage directory

  clientPort=2181 #Client connection port

  #maxClientCnxns=60 #For the limit of the number of connections for a client, the default is 60, but some teams deploy dozens of applications to one machine to facilitate testing, so this value will be exceeded.

  #autopurge.purgeInterval=1 #Specifies the cleaning frequency, the unit is hour, you need to fill in an integer of 1 or greater, the default is 0, which means that the self-cleaning function is not enabled.

  #autopurge.snapRetainCount=3 #This parameter is used in conjunction with the above parameters. This parameter specifies the number of files that need to be retained. The default is to keep 3.

  server.A=B:C:D:

  A: # represents a number, which indicates the server number.

  B: # represents the IP address of the server.

  C:# represents the port through which the server exchanges information with the Leader server in the cluster.

  D: #If the leader server in the cluster hangs, a port is needed to re-elect and elect a new leader, and this port is the port used to communicate with each other during the election.

  1.4.2.2 Zookeeper configuration file modification

  Because of the pseudo-cluster configuration, B (IP address) is the same, so the communication port numbers of different Zookeeper instances cannot be the same, so different port numbers must be assigned to it.

  [root@linux-node2 conf]# grep '^[az]' zoo.cfg #Filter out the modified configuration

  tickTime=2000

  initLimit = 10

  syncLimit=5

  dataDir=/data/zk1

  clientPort=2181

  server.1=192.168.56.12:3181:4181

  server.2=192.168.56.12:3182:4182

  server.3=192.168.56.12:3183:4183

  1. Create three directories to store Zookeeper data

  [root@linux-node2 conf]# mkdir -p /data/{zk1,zk2,zk3}

  [root@linux-node2 conf]# echo “1” > /data/zk1/myid

  [root@linux-node2 conf]# echo “2” > /data/zk2/myid

  [root@linux-node2 conf]# echo “3” > /data/zk3/myid

  2. Generate three Zookeeper configuration files

  [root@linux-node2 conf]# pwd

  /usr/local/zookeeper/conf

  [root@linux-node2 conf]# cp zoo.cfg zk1.cfg

  [root@linux-node2 conf]# cp zoo.cfg zk2.cfg

  [root@linux-node2 conf]# cp zoo.cfg zk3.cfg

  3. Modify the data storage directory and port corresponding to zk2 and zk3

  [root@linux-node2 conf]# pwd

  /usr/local/zookeeper/conf

  [root@linux-node2 conf]# sed -i ‘s#zk1#zk2#g’ zk2.cfg

  [root@linux-node2 conf]# sed -i ‘s#zk1#zk3#g’ zk3.cfg

  [root@linux-node2 conf]# sed -i ‘s#2181#2182#g’ zk2.cfg

  [root@linux-node2 conf]# sed -i ‘s#2181#2183#g’ zk3.cfg

  1.4.2.3 Zookeeper role view

  1. Start Zookeeper and view the role

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zk1.cfg #Start zk1

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk1.cfg

  Starting zookeeper … STARTED

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zk2.cfg #Start zk2

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk2.cfg

  Starting zookeeper … STARTED

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zk3.cfg #Start zk3

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk3.cfg

  Starting zookeeper … STARTED

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh status /usr/local/zookeeper/conf/zk1.cfg

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk1.cfg

  Mode: follower #zk1 current status Follower

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh status /usr/local/zookeeper/conf/zk2.cfg

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk2.cfg

  Mode: leader #zk2 current status Leader

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkServer.sh status /usr/local/zookeeper/conf/zk3.cfg

  ZooKeeper JMX enabled by default

  Using config: /usr/local/zookeeper/conf/zk3.cfg

  Mode: follower #zk3 Current status Follower

  2. Connect to Zookeeper

  [root@linux-node2 conf]# /usr/local/zookeeper/bin/zkCli.sh -server 192.168.56.12:2181

  1.5Mesos cluster deployment

  To install the Mesosphere repository, you need to install it on the Mesos Master and Mesos Slave nodes

  1.5.1Mesos_Master deployment

  [root@linux-node1 ~]# rpm -ivh http://repos.mesosphere.com/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm

  #node1 install Mesosphere repository

  [root@linux-node1 ~]# yum -y install mesos marathon #Install Mesos and Marathon

  [root@linux-node1 ~]# cat /etc/mesos/zk #Add Zookeeper configuration

  zk: //192.168.56.12: 2181,192.168.56.12: 2182,192.168.56.12: 2183 / mesos

  [root@linux-node1 ~]# systemctl enable mesos-master mesos-slave marathon #Add to start Mesos-master slave Marathon

  [root@linux-node1 ~]# systemctl start mesos-master mesos-slave marathon #启动Mesos-master slave Marathon

  1.5.2Mesos_Slave deployment

  [root@linux-node2 ~]# rpm -ivh

  http://repos.mesosphere.com/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm #node2 install Mesosphere repository

  [root@linux-node2 ~]# yum -y install mesos #Install Mesos

  [root@linux-node2 ~]# systemctl start mesos-slave #启动Mesos-slave

  1.5.3Mesos_Web interface

  Visit: http://192.168.56.11:5050 as shown in Figure 1.5-1

  

 

  Figure 1.5-1 The Tasks table does not have any entries

  Run the Mesos task, you can view the task on the web interface as shown in Figure 1.5-2

  [root@linux-node1 ~]# MASTER=$(mesos-resolve `cat /etc/mesos/zk`)

  [root@linux-node1 ~]# mesos-execute –master=$MASTER –name=”cluster-test”–command=”sleep 60″

  

 

  Figure 1.5-2 Running Mesos tasks

  1.5.4 Marathon calls Mesos to run Docker containers

  [root@linux-node1 ~]# yum -y install docker #安装Docker

  [root@linux-node1 ~]# systemctl enable docker #Docker joins auto-start at boot

  [root@linux-node1 ~]# systemctl start docker #Start Docker container

  [root@linux-node1 ~]# docker pull nginx #pull an nginx image

  linux-node1: Add configuration parameters to all mesos-slave and restart

  [root@linux-node1 ~]# echo ‘docker,mesos’ | tee /etc/mesos-slave/containerizers

  [root@linux-node1 ~]# systemctl restart mesos-slave #重启Mesos-slave

  linux-node2: Add configuration parameters to all mesos-slave and restart

  [root@linux-node2 ~]# echo ‘docker,mesos’ | tee /etc/mesos-slave/containerizers

  [root@linux-node2 ~]# systemctl restart mesos-slave #重启Mesos-slave

  By default, marathon listens to port 8080, and creates a project through marathon, as shown in Figure 1.5-3

  

 

  Figure 1.5-3 Marathon interface

  Next, through Mesos scheduling, use marathon to create a Docker container with an nginx image. When Marathon starts, it will read the /etc/mesos/zk configuration file, and Marathon will find the Mesos Master through Zookeeper.

  Marathon has its own REST API, and we use the API to create a Nginx Docker container. First create the following configuration file nginx.json

  [root@linux-node1 ~]# cat nginx.json

  {

  “id”:”nginx”,

  “cpus”:0.2,

  “mem”:32.0,

  “instances”: 1,

  “constraints”: [[“hostname”,

  “UNIQUE”,””]],

  “container”: {

  “type”:”DOCKER”,

  “docker”: {

  “image”: “nginx”,

  “network”: “BRIDGE”,

  “portMappings”: [

  {“containerPort”: 80,

  “hostPort”: 0,”servicePort”: 0, “protocol”:

  “tcp” }

  ]

  }

  }

  }

  call using curl

  [root@linux-node1 ~]# curl -X POST

  http://192.168.56.11:8080/v2/apps -d @/root/nginx.json -H “Content-type:

  application/json”

  [root@linux-node1 ~]# docker ps -a #View containers created by API and manually

  CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

  1231814cd679 nginx “nginx -g ‘daemon off” 56 seconds ago Up 55 seconds 443/tcp, 0.0.0.0:31011->80/tcpmesos-16f943e5-be56-4254-858e-6347b89779de-S0.c47be185-eafc-4bd6-b0ca-e13e4536440b

  Access the port 31011 randomly started by Docker, as shown in Figure 1.5-4

  

 

  Figure 1.5-4 Successfully access the Nginx interface

  View in the MarathonWeb interface, as shown in Figure 1.5-5

  

 

  Figure 1.5-5 MarathonWeb interface to see that Nginx is already running

  View on the Mesos interface, as shown in Figure 1.5-6

  

 

  Figure 1.5-5Mesos interface to view Nginx Tasks

  You can also click Create in the upper left corner of Marathon to create a container.

  Note: Nginx is used as an example here. The marathon+mesos+docker cluster of the registry is not suitable for services, that is to say, it is not suitable for opening ports to the outside world. For example, crawlers, because the ports are not open, all they need to do is to take resources from the queue, process them, and store them.

  This is just the release management operation of the job, and whether there is CI, that is to say, continuous integration. Later, I will release the continuous integration solution of jenkins+docker+mesos+marathon+git, if the code release is combined with the DCO framework.

 

 

http://www.chinacloud.cn/show.aspx?id=23655&cid=16

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326644163&siteId=291194637