Micro registration service discovery Cluster Setup --Registrator + Consul + Consul-template + nginx

In Internet applications, the dynamic nature of the demand for services is very common, which auto-discovery of services and dynamic expansion made high demands.

Micro-service system at every turn tens of thousands of services, but also dynamic stretching. Manually written IP, Port hard-coded scripts can not do large-scale automated way, a little more silly operation and maintenance services. Micro service is bound to do automatically assigned ip and port, reducing manual intervention. We need to let every service can create a dynamic address, and address the caller to be able to perceive change.

This requires a service registration and discovery mechanism, this document is to discuss how to implement this mechanism.

1. Service registration process discovered

We do this thing to be achieved in order to:

Registration discovery mode The traditional model
Auto service startup was found Manual registration
Dynamic load balancing changes Artificial write a static configuration
Automatic retractable scale Manual adjustment of operation and maintenance for a long time

1.1 Services "self-registration" and "third-party registration."

Registration divided by source

1. Self-registration: internal client service starts, connection to the registry, write service information.

benefit:

  • We did not introduce a third party, a small number of processes, less dependent.

problem:

  • Service code registry hard-coded, if you change the registry, the service code must follow adjustments;
  • Registry must maintain communication with each service, to do the heartbeat detection. If a lot of services, but also an additional overhead of registry;

2. Third-party registration (by way of this article): The collaborative process mode, monitoring changes in the service process, the service information into the registry.

  • Benefits: did decoupling service and registration center for services, completed the automated registration services;
  • Question: collaborative process itself should be considered highly available, otherwise it will be the single point of failure risk points;

1.2 to achieve self-registration

Self-registration is not our Benpian be discussed, you can write your own code to achieve, we discuss the implementation of third party registration.

1.3 implementation of third party registration

Docker's emergence and the rise of micro-services architecture for many open source projects started to pay attention in the premise of loosely coupled architecture, based on how Docker achieve a truly dynamically extensible services architecture.

Here we use Registrator + Consul + Consul-template + Nginx these open source components to achieve dynamic expansion of service registration and discovery mechanism, of course, there is no doubt they are run on the docker.

First look at the process:

Write pictures described here

Registration Service Center: As the whole structure of the core, to support distributed, persistent storage, real-time notification of consumers change registration information.

Service Provider: Service of containers to docker deploy (implement service ports dynamically generated), and to docker-compose way to manage, can detect docker process information in order to complete the automatic registration services through Registrator.

Consumer Services: To use the service provided by the service providers, and service providers are often dynamically switch positions with each other.

  1. Registration service: the service provider to the registry registration;
  2. Subscription service: service consumers to subscribe to the service registry information, be listening;
  3. Cache: local cache list of services, reduce network to communicate with the registry;
  4. Service call: first find a local cache, go to the registry can not find the pull service address, and send service requests;
  5. Change Notification: When you change the service node (add, delete, etc.), the registry will notify the listening nodes, update service information.

2. Introduction Tool

2.1 Registrar

Registrator: Go written by a language used for docker, online or stop running, go to the registrar by the native container inspection process tool. So we have to do the experiment, all the tools are run on a docker, it is because registrator is to determine the service status by checking the docker container state, and so our code fully decoupled, the upper layer transparency no perception. It has the following characteristics

  • Direct monitoring container event by docker socket, registered under the container to start / stop and other event / cancellation of services
  • Different for each exposed service port corresponding to each container
  • Support pluggable registry backend, default support Consul, etcd and SkyDNS
  • Of itself is docker, you can start a container ways
  • Users can customize the configuration, such as service TTL (time-to-live), service name, service tag etc.

2.1 consul

We figure said service registry, that this stuff. Consul is a highly available distributed service discovery and configuration of shared software. HashiCorp developed by the company with the Go language.

Consul here to do docker instance registration and configuration share.

Features:

  • Coherence protocol uses Raft algorithm, easy to use than Paxos algorithm using GOSSIP protocol management members and broadcast messages, and supports ACL access control.
  • It supports multiple data centers to avoid a single point of failure, internal and external network services using different ports for monitoring. And deploy network delay is to be considered, where fragmentation and the like .zookeeper etcd not support multiple data center functions.
  • Health checks. Etcd not.
  • Supports http and dns protocol interface. Zookeeper integration is more complex, etcd only supports http protocol.
  • There is also a web management interface.

2.3 consul-template

Start building a service discovery, mostly used zookeeper / etcd + confd. However complex and difficult to use. consul-template, a substituted probably ConfD position, so that later can etcd + confd or consul + consul-template.

consul template usage scenarios: consul template can query consul in the service catalog, key, key-values ​​and so on. This powerful abstractions and query language templates can be particularly suitable for the consul template to create a dynamic profile. For example: Create apache / nginx proxy balancers, haproxy backends, varnish servers, application configurations.

consul-template provides a convenient way to get the value stored from the consul in, consul-template daemon queries the consul service to update any template specified on the system, after when the update is complete, the template can choose to run some arbitrary command, for example, we use it here to update nginx.conf the configuration file, and then perform nginx -s reload command to update the routing, dynamic adjustment to achieve the purpose of load balancing.

nginx consul-template and must be installed to a machine, as consul-template configuration files need to dynamically modify nginx

2.4 nginx

The familiar names, do not introduce too much, it's here is to do load balancing, forwarding the request with. Of course, the best at load balancing is done directly compare the performance of hardware, software. But the software is low cost and easy maintenance.

3. single experiment

First look at a simple traditional load balancing web service

load balance web servers

This is well understood now, client access nginx, then forwarded to one of the back-end web server, the traditional load balancing. If you have a back-end add / remove web server, operation and maintenance manual change under nginx.conf, and then reload the configuration, you can adjust the load balancing.

Look at our automatic load balancing in a microprocessor-based service registration and discovery mode:

Servies register and find

Load balancing mode has not changed, just a few more peripheral components, of course, these components are not visible to client, client still only see the nginx entrance, access methods did not change.

其中,我们用registrator来监控每个web server的状态。当有新的web server启动的时候,registrator会把它注册到consul这个注册中心上。由于consul_template已经订阅了该注册中心上的服务消息,此时consul注册中心会将新的web server信息推送给consul_template,consul_template则会去修改nginx.conf的配置文件,然后让nginx重新载入配置以达到自动修改负载均衡的目的。同样当一个web server挂了,registrator也能感知到,进而通知consul做出响应。

整个过程不需要运维人工的干预,自动完成。接下来我们找一台机器上实践下这个方案

3.1 环境

header header
操作系统 ubuntu:16.04 x86_64,内核:4.8.0-58-generic
主机ip 10.111.152.136
docker Docker version 1.12.6, build 78d1802
docker-compose docker-compose version 1.8.0, build unknown

首先安装 docker 和 docker-compose

$ apt-get install docker docker-compose -y

随便找个目录,创建模板文件 docker-compose.yml

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-1
    MY_HOST: host-1
  ports:
  - "80"

#load balancer will automatically update the config using consul-template
lb:
  image: liberalman/nginx-consul-template:latest
  hostname: lb
  links:
  - consulserver:consul
  ports:
  - "80:80"

consulserver:
  image: progrium/consul:latest
  environment:
    SERVICE_TAGS: consul servers
  hostname: consulserver
  ports:
  - "8300"
  - "8400"
  - "8500:8500"
  - "53"
  command: -server -ui-dir /ui -data-dir /tmp/consul -bootstrap-expect 1

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator
  links:
  - consulserver:consul
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command: -internal consul://consul:8500

注意: liberalman/helloworld和liberalman/nginx-consul-template这两个镜像我已经实现了,可以pull下来,大家可以直接使用。想要看他们怎么写的,访问https://github.com/liberalman

3.2 启动

进入模板所在目录,执行

$ docker-compose up

没问题的话就启动成功了,其中的镜像自动被下。访问 http://localhost 可以看到一个 web 页面:

Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.2</font>. I saw that you are 172.17.0.6:35612.

这个内容实际是后端web服务器helloworld所反馈的页面,它告诉我们它自己的地址是172.17.0.2(docker的内网地址),它所看到的前端访问过来的ip是172.17.0.6,实际上这个前端是我们的nginx的负载均衡的代理转发的,所以它看到的实际是nginx的地址。

这里的host-1是我自己设置的物理机的名称,注释不是操作系统那hostname,纯粹是为了在页面上好显示,以及后期多个物理机实验的时候好区分不同物理机器,所以自定义了一个临时名称。它对应docker-compose.yml中的MY_HOST环境变量,会通过docker容器传递到helloworld的运行环境中。

要停止服务Ctrl + C就行了,如果有些没有停止,则

$ docker-compose down

如果要在后台运行

$ docker-compose up -d

3.3 负载均衡

回到正题,在浏览器上多次刷新,可以看到后端地址没有变化,这是因为只有一个 web 后端服务器。

如果要测试一下nginx负载均衡的效果,则调整后端为 3 个服务器。先停掉服务,然后

$ docker-compose scale web=3
$ docker-compose up

再次访问 http://localhost ,多次刷新,可以看到页面的实际目标地址发生了变化,有3个ip轮换。新启动的 web 后端服务器被自动注册,并且 nginx 也把新的路由添加上了:

Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.2</font>. I saw that you are 172.17.0.6:36710.
Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.3</font>. I saw that you are 172.17.0.6:35210.
Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.4</font>. I saw that you are 172.17.0.6:58678.

3.4 查看服务状态

要查看节点注册状况,到 http://localhost:8500 可以看到 consul web ui 的管理端

consul ui

点击SERVICES这个按钮,列出所有被注册的服务。

  • consul server,看到有多个是因为监听多个端口,还有udp端口的。
  • my-web-server就是后端web服务,这个名称是要在docker-compose模板中设置SERVICE_80_NAME这个变量的,针对80端口,详情见registrator 用户指导手册
    https://gliderlabs.com/registrator/latest/user/services/
  • nginx-consul-template就是nginx和consul-template的合体服务。

点击my-web-server,可以看到它右侧的服务节点数,这里只有一个,有多个的话会依次列出

host-1 my-web-server

4. 两台物理机

以上都是在单台物理机上完成的,下面我们要测试下多台物理机情况下,真正分布式的效果。

host name real ip services
host-1 10.111.152.136 registrator、helloworld、consul-server、consul-template、nginx
host-2 10.111.152.135 registrator、helloworld

第一台物理机host-1的docker-compse.yml

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-1
    MY_HOST: host-1
  ports:
  - "80"

#load balancer will automatically update the config using consul-template
lb:
  image: liberalman/nginx-consul-template:latest
  hostname: lb
  links:
  - consulserver:consul
  ports:
  - "80:80"

consulserver:
  image: progrium/consul:latest
  environment:
    SERVICE_TAGS: consul servers
  hostname: consulserver-node1
  ports:
  - "8300"
  - "8400"
  - "8500:8500"
  - "53"
  command: -server -ui-dir /ui -data-dir /tmp/consul -bootstrap-expect 1

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator-1
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command: -ip=10.111.152.136 consul://10.111.152.136:8500

我们第二台机器host-2的yml文件:

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-2
    MY_HOST: host-2
  ports:
  - "80"

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator-2
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command: -ip 10.111.152.135 consul://10.111.152.136:8500

这是我们将MY_HOST改为host-2了,以便在页面查看的时候可以直观看到。另外的重要改变就是registrator的启动参数,我们去掉了上报docker内部ip的-internal,转而使用了外部ip,将自己本机的ip 10.111.152.135上报了。同时要访问的consul服务器参数配置成host-1的ip地址 10.111.152.136。还有registrator的hostname要和第一台机器的区别开,我改成registrator-2了,这样在注册到consul中的时候,不会覆盖掉。hostname一样consul无法区分是哪个机器的,这样两个机器的registrator会相互覆盖。

host-1启动方式不变,我们现在到host-2上启动,看看效果,是否新节点被加上了。

Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.2</font>. I saw that you are 172.17.0.5:41464.

Hello World! I'm <font color=blue>host-2</font> <font color=red>addr:172.17.0.2</font>. I saw that you are 10.111.152.136:41578.

刷新两次,发现一会儿是host-1,一会儿是host-2,说明我们host-2物理机上的服务被添加进来了,并且被nginx路由到了。

同时consul ui,看到新的节点果然被添加上了

host-2 my-web-server

不过发现个问题,如果在host-2上先将registrator关闭,再关闭host-2上的后端web,我们的consul服务器可以感知到,但是那个consul ui界面没更新,依然显示两个节点。

5. Consul Cluster

以上我们的实验其实是个单点的consul服务,点击consul ui页面的NODES按钮可以看到

single node

只有一个consul server节点,也就是在我们host-1上跑的节点consulserver,另外一个物理机上没有运行consul节点。一旦它挂了整个服务注册功能就歇菜了。既然是分布式,一定要发挥集群的优势以解决单点问题。所以,我们要建立Consul Cluster。

在Consul方案中,每个提供服务的节点上都要部署和运行一个agent,所有运行Consul agent节点的集合构成Consul Cluster。

Consul agent有两种运行模式:Server和Client。这里的Server和Client只是Consul集群层面的区分,与搭建在Cluster之上的应用服务无关。

以Server模式运行的Consul agent节点用于维护Consul集群的状态,官方建议每个Consul Cluster至少有3个或以上的运行在Server mode的Agent,Client节点不限。

这里写图片描述

每个数据中心的Consul Cluster都会在运行于server模式下的agent节点中选出一个Leader节点,这个选举过程通过Consul实现的raft协议保证,多个 server节点上的Consul数据信息是强一致的。处于client mode的Consul agent节点比较简单,无状态,仅仅负责将请求转发给Server agent节点。

我们这次的架构有些调整,绘制一个服务器的逻辑上的部署图来说明下

Services register adn find, consul cluster

这是一张逻辑上服务部署的图,我们找3台机器来实验。每台机器上部署几个web server,一个registrator和一个consul client,这是基本需求。另外再建立一个consul cluster集群,用来当我们的注册中心。当web server启动后,被registrator感知,进而将注册信息发送给consul client,consul client则访问注册中心的leader节点,上报新加入的服务信息。consul cluster会将新的服务信息推送给已经到它这里订阅了服务消息的consul-template,consul-template再去修改和自己同一台机器上的nginx,以达到动态调整负载均衡的目的。

注意:由于资源有限,我们没有单独使用机器去搭建consul集群,所以图中的consul client和consul server节点其实是同一个节点,因为server模式同时可以提供client的功能嘛。那个consul cluster集群其实是分布到3个host中建立起来的,我们就在3个host中分别启动一个consul进程,每个都同时担任server和client的功能。

5.1 配置

host name real ip services note
host-1 10.111.152.136 registrator、helloworld*n、consul-server、consul-template、nginx 放置consol web ui和nginx负载均衡
host-2 10.111.152.135 registrator、helloworld*n、consul-server
host-3 10.111.152.168 registrator、helloworld*n、consul-server

host-1作为运行负载均衡的机器,部署consul-template和nginx。每个机器上都部署了consul-server节点,也就是我们有3个节点,接下来就研究这3个节点是如何选举leader的。

host-1的docker-compose.yml文件

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-1
    MY_HOST: host-1
  ports:
  - "80"

#load balancer will automatically update the config using consul-template
lb:
  image: liberalman/nginx-consul-template:latest
  hostname: lb
  links:
  - consulserver:consul
  ports:
  - "80:80"

consulserver:
  image: progrium/consul:latest
  environment:
    SERVICE_TAGS: consul servers
  hostname: consulserver-node1
  ports:
  - "8300:8300"
  - "8301:8301"
  - "8301:8301/udp"
  - "8302:8302"
  - "8302:8302/udp"
  - "8400:8400"
  - "8500:8500"
  - "53:53/udp"
  command: -server -ui-dir /ui -advertise 10.111.152.136 -bootstrap-expect 3

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator-1
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command: -ip 10.111.152.136 consul://10.111.152.136:8500

参数解释下

  • hostname,将来consul节点都靠这个来标识了,所以每个物理机上的节点名称都要区别开,以免冲突。
  • -bootstrap-expect 3,这个参数的作用是,当consulserver-node1节点启动之后,等待另外两个节点的加入,3个节点聚齐后,之后才开始选举leader。
  • -advertise 10.111.152.136,如果要让节点在WAN网络中被发现,就要配置这个参数,暴露出外网ip。如果只在LAN中被发现,就不用配置这个了,默认绑定内网ip。
  • -ui-dir /ui,这个配置是指定当前节点支持consul ui的web页面。

host-2的docker-compose.yml文件

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-2
    MY_HOST: host-2
  ports:
  - "80"

consulserver:
  image: progrium/consul:latest
  environment:
    SERVICE_TAGS: consul servers
  hostname: consulserver-node2
  ports:
  - "8300:8300"
  - "8301:8301"
  - "8301:8301/udp"
  - "8302:8302"
  - "8302:8302/udp"
  - "8400:8400"
  - "8500:8500"
  - "53:53/udp"
  command: -server -advertise 10.111.152.135  -join 10.111.152.136

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator-2
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command:  -ip 10.111.152.135 consul://10.111.152.136:8500

与host-1不同的是,host-2使用了参数
-join 10.111.152.136 意思是把本节点加入到10.111.152.136这个ip的节点中,这是consulserver-node1的地址。我们上一个host的配置中表明,consulserver-node1这个节点启动后,会等待另外两个节点的加入,我们这里就是加入它。

host-3的docker-compose.yml文件

#backend web application, scale this with docker-compose scale web=3
web:
  image: liberalman/helloworld:latest
  environment:
    SERVICE_80_NAME: my-web-server
    SERVICE_TAGS: backend-3
    MY_HOST: host-3
  ports:
  - "80"

consulserver:
  image: progrium/consul:latest
  environment:
    SERVICE_TAGS: consul servers
  hostname: consulserver-node3
  ports:
  - "8300:8300"
  - "8301:8301"
  - "8301:8301/udp"
  - "8302:8302"
  - "8302:8302/udp"
  - "8400:8400"
  - "8500:8500"
  - "53:53/udp"
  command: -server -advertise 10.111.152.168 -join 10.111.152.136

# listen on local docker sock to register the container with public ports to the consul service
registrator:
  image: gliderlabs/registrator:master
  hostname: registrator-3
  volumes:
  - "/var/run/docker.sock:/tmp/docker.sock"
  command: -ip 10.111.152.168 consul://10.111.152.136:8500

注意:到这里你可能有疑问,上文的3个节点都是server节点,那client节点哪里去了,没有client节点怎么访问集群啊?我们和集群交互可是访问client,client再转发到server节点的。

我们前篇也提到过,其实每个server节点,本身就具有client的功能,只是多了一些把所有的信息持久化的本地以及选举leader的功能呢,这样遇到故障,信息是可以被保留的。

所以,这里我们每个主机上部署registrator的时候,配置的访问consul服务的地址也是就近访问本机上的consul节点,把它当成一个consul client访问就可以了。当然也可以单独部署一个client节点,只是我们至少要保证有3个server节点,才能完成leader选举,如果再多一台机器我会考虑专门加一个client节点。

5.2 启动

依次在host-1、host-2和host-3上启动3个节点。注意执行docker-compose up之后,不要关闭终端,让它一直打印,后续我们还要在这里看日志,别的操作都转到新开终端上执行。访问 http://10.111.152.136:8500/ui/#/dc1/nodes 看到节点都被添加上了

这里写图片描述

除了查看ui界面外,也可以使用命令行看看有哪些服务注册了,在新终端下执行

~# curl 10.111.152.136:8500/v1/catalog/services|jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   308  100   308    0     0  54892      0 --:--:-- --:--:-- --:--:-- 61600
{
  "consul": [],
  "consul-53": [
    "consul servers",
    "udp"
  ],
  "consul-8300": [
    "consul servers"
  ],
  "consul-8301": [
    "consul servers",
    "udp"
  ],
  "consul-8302": [
    "consul servers",
    "udp"
  ],
  "consul-8400": [
    "consul servers"
  ],
  "consul-8500": [
    "consul servers"
  ],
  "my-web-server": [
    "backend-1",
    "backend-2",
    "backend-3"
  ],
  "nginx-consul-template": []
}

访问 http://10.111.152.136 查看nginx负载均衡的效果,依次刷新,得到

Hello World! I'm <font color=blue>host-1</font> <font color=red>addr:172.17.0.2</font>. I saw that you are 172.17.0.1:49728.
Hello World! I'm <font color=blue>host-2</font> <font color=red>addr:172.17.0.3</font>. I saw that you are 10.111.152.136:54640.
Hello World! I'm <font color=blue>host-3</font> <font color=red>addr:172.17.0.3</font>. I saw that you are 10.111.152.136:58660.

OK,看起来一切正常。那我们现在分析下到底哪个节点是leader,有节点退出会怎样?

现在新开一个终端,在host-1上,执行

docker ps -f name=consul

查到consul节点的容器id是4364cd41f2ba。登录这个容器

docker exec -it 4364cd41f2ba /bin/sh

然后就进入容器的操作系统环境了,在该环境下执行

/ # consul members
Node                Address              Status  Type    Build  Protocol  DC
consulserver-node3  10.111.152.168:8301  alive   server  0.5.2  2         dc1
consulserver-node1  10.111.152.136:8301  alive   server  0.5.2  2         dc1
consulserver-node2  10.111.152.135:8301  alive   server  0.5.2  2         dc1

一目了然的看到了我们的3个consul节点。查看当前节点信息

/ # consul info
......
consul:
        bootstrap = false
        known_datacenters = 1
        leader = false
        server = true
raft:
        applied_index = 192
        commit_index = 192
        fsm_pending = 0
        last_contact = 15.960533ms
        last_log_index = 192
        last_log_term = 2
        last_snapshot_index = 0
        last_snapshot_term = 0
        num_peers = 2
        state = Follower
        term = 2
......

输出信息很多,省略掉,只给出重要的。server = true确实是server节点。看到leader=false,说明这个节点不是leader。state = Follower,看来确实是个Follower节点哦。last_contact = 15.960533ms心跳剩余时间,term = 2说是第二个term,已经选过2回了。

执行上述命令的同时,由于之前在host-1上执行docker-compose up命令的时候,日志是直接输出到屏幕上的,我们此时可以节点1输出的日志

consulserver_1  |     2017/07/26 05:08:20 [INFO] agent.rpc: Accepted client: 127.0.0.1:47084
consulserver_1  |     2017/07/26 05:08:24 [INFO] agent.rpc: Accepted client: 127.0.0.1:47086
......

我们刚才执行的命令都是客户端发到当前consul server上的。

同样的方式,在节点在conserver-node3上

consul:
        bootstrap = false
        known_datacenters = 1
        leader = true
        server = true

原来leader是节点3.

5.3 去掉节点

让一个节点挂掉,看看会发生什么。

5.3.1 关闭一个节点

在host-1上新开终端执行

 docker stop 4364cd41f2ba

看到host-1的日志滚动

gocode_consulserver_1 exited with code 1
lb_1            | 2017/07/26 06:02:51.211894 [WARN] (view) health.service(my-web-server|passing): Get http://consul:8500/v1/health/service/my-web-server?index=40&passing=1&stale=&wait=60000ms: dial tcp 172.17.0.4:8500: i/o timeout (retry attempt 1 after "250ms")
ex=40&passing=1&stale=&wait=60000ms: dial tcp 172.17.0.4:8500: i/o timeout (retry attempt 1 after "250ms")
lb_1            | 2017/07/26 06:03:10.099572 [WARN] (view) health.service(my-web-server|passing): Get http://consul:8500/v1/health/service/my-web-server?index=40&passing=1&stale=&wait=60000ms: dial tcp 172.17.0.4:8500: getsockopt: no route to host (retry attempt 2 after "500ms")
......

lb_1会不断的打印重试到http://consul:8500的健康检查。

不过此时访问 http://10.111.152.136 发现nginx并没有被破坏,还是可以正常路由到后端三个节点的,后端的web server也正常可用。没有受到一个consul server节点挂掉的影响。

只是consul web ui无法访问了,http://10.111.152.136:8500/ui/#/dc1/services 因为刚好把这个节点停掉了。

另外两个节点的日志情况

host-2机器上,consulserver-node2节点,也是一个Follower状态的节点上

consulserver_1  |     2017/07/26 06:02:24 [INFO] memberlist: Suspect consulserver-node1 has failed, no acks received
consulserver_1  |     2017/07/26 06:02:27 [INFO] memberlist: Suspect consulserver-node1 has failed, no acks received
consulserver_1  |     2017/07/26 06:02:27 [INFO] memberlist: Marking consulserver-node1 as failed, suspect timeout reached
consulserver_1  |     2017/07/26 06:02:27 [INFO] serf: EventMemberFailed: consulserver-node1 10.111.152.136
consulserver_1  |     2017/07/26 06:02:27 [INFO] consul: removing server consulserver-node1 (Addr: 10.111.152.136:8300) (DC: dc1)
consulserver_1  |     2017/07/26 06:03:19 [INFO] serf: attempting reconnect to consulserver-node1 10.111.152.136:8301
consulserver_1  |     2017/07/26 06:03:49 [INFO] serf: attempting reconnect to consulserver-node1 10.111.152.136:8301
consulserver_1  |     2017/07/26 06:05:19 [INFO] serf: attempting reconnect to consulserver-node1 10.111.152.136:8301
......

每隔30s尝试重连node1.

host-3机器上,consulserver-node3节点,我们的leader

consulserver_1  |     2017/07/26 06:02:21 [INFO] raft: aborting pipeline replication to peer 10.111.152.136:8300
consulserver_1  |     2017/07/26 06:02:21 [ERR] raft: Failed to AppendEntries to 10.111.152.136:8300: EOF
consulserver_1  |     2017/07/26 06:02:21 [ERR] raft: Failed to heartbeat to 10.111.152.136:8300: dial tcp 10.111.152.136:8300: connection refused
consulserver_1  |     2017/07/26 06:02:21 [ERR] raft: Failed to AppendEntries to 10.111.152.136:8300: dial tcp 10.111.152.136:8300: connection refused
consulserver_1  |     2017/07/26 06:02:21 [ERR] raft: Failed to heartbeat to 10.111.152.136:8300: dial tcp 10.111.152.136:8300: connection refused
consulserver_1  |     2017/07/26 06:02:21 [ERR] raft: Failed to AppendEntries to 10.111.152.136:8300: dial 
......

也在尝试重连,而且它间隔2s就尝试一次,频率上更快。由于一直连不上,后来干脆去掉node1节点了。

consulserver_1  |     2017/07/26 06:02:27 [INFO] memberlist: Suspect consulserver-node1 has failed, no acks received
consulserver_1  |     2017/07/26 06:02:27 [INFO] memberlist: Marking consulserver-node1 as failed, suspect timeout reached
consulserver_1  |     2017/07/26 06:02:27 [INFO] serf: EventMemberFailed: consulserver-node1 10.111.152.136
consulserver_1  |     2017/07/26 06:02:27 [INFO] consul: removing server consulserver-node1 (Addr: 10.111.152.136:8300) (DC: dc1)

不过虽然去掉了node1,但是其他节点依然没有放弃尝试重连node1。重连的操作一直都在继续中。

5.3.2 恢复节点

把刚才在host-1上关闭的容器重新启动

docker start 4364cd41f2ba

看看3个机器都会输出什么。

host-1上

consulserver_1  | ==> Starting raft data migration...
consulserver_1  | ==> Starting Consul agent...
consulserver_1  | ==> Starting Consul agent RPC...
consulserver_1  | ==> Consul agent running!
consulserver_1  |          Node name: 'consulserver-node1'
consulserver_1  |         Datacenter: 'dc1'
consulserver_1  |             Server: true (bootstrap: false)
consulserver_1  |        Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 53, RPC: 8400)
consulserver_1  |       Cluster Addr: 10.111.152.136 (LAN: 8301, WAN: 8302)
consulserver_1  |     Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
consulserver_1  |              Atlas: <disabled>

......

consulserver_1  |     2017/07/26 06:23:11 [INFO] consul: New leader elected: consulserver-node2
......

consulserver-node1节点又重新启动了,并且整个集群选举了新的leader上来:consulserver-node2

host-2上

consulserver_1  |     2017/07/26 06:23:05 [INFO] consul: adding server consulserver-node1 (Addr: 10.111.152.136:8300) (DC: dc1)

......

consulserver_1  |     2017/07/26 06:23:11 [INFO] consul: New leader elected: consulserver-node2
......

感知到了consulserver-node1的复活,并且也参与了选举,选出新leader,是自己,哈哈。

host-3上

consulserver_1  |     2017/07/26 06:23:05 [INFO] consul: adding server consulserver-node1 (Addr: 10.111.152.136:8300) (DC: dc1)

......

consulserver_1  |     2017/07/26 06:23:11 [INFO] consul: New leader elected: consulserver-node2
......

同样感知到了consulserver-node1的复活,并且也参与了选举,选出新leader。

此时ngxin依然没受影响,web服务正常。而且consul web ui也可以正常访问了。一切都恢复如初。具体这3个节点是如何选举leader和处理节点的退出和重入的,参考我另外一篇文章 consul实现的raft算法选举过程解析


创建于 2017-07-23 北京,更新于 2017-07-27 北京

该文章在以下平台同步

附录:

文中架构图都是用graphviz绘制的,附上图源码

digraph G {
    size="6,6";
    label="services register"
    node [colorscheme=paired12, color=1, style=filled];
    register_center    [label="注册中心", color=5, shape="record"]
    consumer    [label="服务消费者", color=4, shape="record"]
    service    [label="服务提供者", color=2, shape="record"]

    consumer -> register_center [label="2.订阅"]
    register_center -> consumer [label="5.通知" style=dashed]
    
    consumer -> service [label="4.调用"]
    consumer -> consumer [label="3.缓存" style=dashed]
    service -> register_center [label="1.注册"]
}


digraph G {
    size="6,6";
    label="load balance web servers"
    node [colorscheme=paired12, color=1, style=filled];
    nginx    [label="nginx", color=3, shape="record"]
    my_web_server_1    [label="my_web_server_1", color=4, shape="record"]
    my_web_server_2    [label="my_web_server_2", color=4, shape="record"]
    my_web_server_3    [label="my_web_server_3", color=4, shape="record"]

    {Client1 Client2 Client3} -> nginx [label="访问"]
    
    nginx -> {my_web_server_1 my_web_server_2 my_web_server_3} [label="转发"]
}

digraph G {
    size="6,6";
    label="Services register and find"
    node [colorscheme=paired12, color=1, style=filled];
    consul     [label="consul", color=1]
    consul_template     [label="consul_template", color=2]
    nginx    [label="nginx", color=3, shape="record"]
    registrator    [label="registrator", color=5]
    my_web_server_1    [label="my_web_server_1", color=4, shape="record"]
    my_web_server_2    [label="my_web_server_2", color=4, shape="record"]
    my_web_server_3    [label="my_web_server_3", color=4, shape="record"]

    {Client1 Client2 Client3} -> nginx [label="访问"]
    nginx -> {my_web_server_1 my_web_server_2 my_web_server_3} [label="转发"]
    {my_web_server_1 my_web_server_2 my_web_server_3} -> registrator [color="red",style="dashed",label="监控"]
    registrator -> consul [color="red",style="dashed",label="注册"]
    consul -> consul_template [dir=both color=red style="dashed" label="订阅服务"]
    
    consul_template -> nginx [color=red,style="dashed",label="配置更新"]
}


digraph G {
    size="6,6";
    label="Services register and find, consul cluster"
    node [colorscheme=paired12, color=1, style=filled];
    consul_node1     [label="consul_node1(leader)", color=7]
    consul_node2     [label="consul_node2", color=7]
    consul_node3     [label="consul_ndoe3", color=7]
    consul_client1     [label="consul_client1", color=7]
    consul_client2     [label="consul_client2", color=7]
    consul_client3     [label="consul_client3", color=7]
    consul_template     [label="consul_template", color=2]
    nginx    [label="nginx", color=3, shape="record"]
    registrator_1    [label="registrator_1", color=5]
    registrator_2    [label="registrator_2", color=5]
    registrator_3    [label="registrator_3", color=5]
    my_web_server_1    [label="my_web_server_1", color=4, shape="record"]
    my_web_server_2    [label="my_web_server_2", color=4, shape="record"]
    my_web_server_3    [label="my_web_server_3", color=4, shape="record"]
    my_web_server_4    [label="my_web_server_4", color=4, shape="record"]
    my_web_server_5    [label="my_web_server_5", color=4, shape="record"]
    my_web_server_6    [label="my_web_server_6", color=4, shape="record"]

    {Client1 Client2 Client3} -> nginx [label="访问"]
    nginx -> {my_web_server_1 my_web_server_2 my_web_server_3 my_web_server_4 my_web_server_5 my_web_server_6} [label="转发"]
    {my_web_server_1 my_web_server_2} -> registrator_1 [color="red",style="dashed",label="监控"]
    {my_web_server_3 my_web_server_4} -> registrator_2 [color="red",style="dashed",label="监控"]
    {my_web_server_5 my_web_server_6} -> registrator_3 [color="red",style="dashed",label="监控"]
    registrator_1 -> consul_client1 [color="red",style="dashed",label="注册"]
    registrator_2 -> consul_client2 [color="red",style="dashed",label="注册"]
    registrator_3 -> consul_client3 [color="red",style="dashed",label="注册"]
    {consul_client1 consul_client2 consul_client3} -> consul_node1 [color="red",style="dashed",label="注册"]
    consul_node1 -> consul_node2 -> consul_node3 [dir=both style=dashed color=blue]
    consul_node1 -> consul_template [dir=both color=red style="dashed" label="订阅服务"]
   
    consul_template -> nginx [color=red,style="dashed",label="配置更新"]
    
    subgraph cluster_host_1 {
        label="host_1"
        my_web_server_1
        my_web_server_2
        registrator_1
        consul_client1
    }
    subgraph cluster_host_2 {
        label="host_2"
        my_web_server_3
        my_web_server_4
        registrator_2
        consul_client2
    }
    subgraph cluster_host_3 {
        label="host_3"
        my_web_server_5
        my_web_server_6
        registrator_3
        consul_client3
    }
    subgraph cluster_clu {
        label="consul cluster"
        consul_node1
        consul_node2
        consul_node3
    }
}


谢谢您的赞赏,我会再接再厉,写出最走心的文章!

赞赏支持

  •  



作者:Liberalman
链接:https://www.jianshu.com/p/a4c04a3eeb57
来源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。

Guess you like

Origin blog.csdn.net/Qsir/article/details/92762185