(八):docker swarm简单使用

简介

Swarm是Docker官方提供的一款集群管理工具,其主要作用是把若干台Docker主机抽象为一个整体,并且通过一个入口统一管理这些Docker主机上的各种Docker资源。Swarm和Kubernetes比较类似,但是更加轻,具有的功能也较kubernetes更少一些。

node

swarm 中的每个 Docker Engine 都是一个 node,有两种类型的 node:manager 和 worker。
我们在 manager node 上执行部署命令,manager node 会将部署任务拆解并分配给一个或多个 worker node 完成部署。
manager node 负责执行编排和集群管理工作,保持并维护 swarm 处于期望的状态。swarm 中如果有多个 manager node,它们会自动协商并选举出一个 leader 执行编排任务。
woker node 接受并执行由 manager node 派发的任务。默认配置下 manager node 同时也是一个 worker node,不过可以将其配置成 manager-only node,让其专职负责编排和集群管理工作。
work node 会定期向 manager node 报告自己的状态和它正在执行的任务的状态,这样 manager 就可以维护整个集群的状态。

service

service 定义了 worker node 上要执行的任务。swarm 的主要编排任务就是保证 service 处于期望的状态下。
举一个 service 的例子:在 swarm 中启动一个 http 服务,使用的镜像是 httpd:latest,副本数为 3。
manager node 负责创建这个 service,经过分析知道需要启动 3 个 httpd 容器,根据当前各 worker node 的状态将运行容器的任务分配下去,比如 worker1 上运行两个容器,worker2 上运行一个容器。
运行了一段时间,worker2 突然宕机了,manager 监控到这个故障,于是立即在 worker3 上启动了一个新的 httpd 容器。
这样就保证了 service 处于期望的三个副本状态。

初始化Swarm

命令参考

[root@node191 docker]# docker swarm --help

Usage:  docker swarm COMMAND

Manage Swarm

Options:


Commands:
  ca          Display and rotate the root CA
  init        Initialize a swarm
  join        Join a swarm as a node and/or manager
  join-token  Manage join tokens
  leave       Leave the swarm
  unlock      Unlock swarm
  unlock-key  Manage the unlock key
  update      Update the swarm

Run 'docker swarm COMMAND --help' for more information on a command.
[root@node191 docker]# docker node --help

Usage:  docker node COMMAND

Manage Swarm nodes

Options:


Commands:
  demote      Demote one or more nodes from manager in the swarm
  inspect     Display detailed information on one or more nodes
  ls          List nodes in the swarm
  promote     Promote one or more nodes to manager in the swarm
  ps          List tasks running on one or more nodes, defaults to current node
  rm          Remove one or more nodes from the swarm
  update      Update a node

Run 'docker node COMMAND --help' for more information on a command.

初始化、加入节点(manager|worker)

  • --advertise-addr 指定与其他 node 通信的地址。根据端口开放防火墙
[root@localhost ~]# docker swarm init --advertise-addr  172.16.1.146
Swarm initialized: current node (v2tjxinr9jxfg52evpswn4yb6) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \
    172.16.1.146:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
  • 如果当时没有记录下 docker swarm init 提示的添加 worker 的完整命令,可以通过 docker swarm join-token worker 查看。
  • 同样的,加入manager通过 docker swarm join-token manager 查看。
[root@localhost ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \
    172.16.1.146:2377


[root@localhost ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \
    172.16.1.146:2377
  
[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n494afsdjzs74q5y5vb4xlgd4    node136   Ready   Active        
v2tjxinr9jxfg52evpswn4yb6 *  node146   Ready   Active        Leader  
    

删除swam节点

  • 从swarm集群中删除节点,需要先把这个节点容器排空,然后再把节点从集群中去掉。
  • 排空节点
  • 这个节点上的容器会先从其它节点启动,再停掉排空节点上的容器,保证服务不受影响。
## 排空node136
[root@node146 ~]# docker node update --availability drain n494afsdjzs74q5y5vb4xlgd4
n494afsdjzs74q5y5vb4xlgd4
  • 删除指定节点

docker node rm  node136
docker node rm --force node16
  • 恢复节点

##将一个排空的节点恢复过来,可以正常使用
docker node update --availability Active n494afsdjzs74q5y5vb4xlgd4
  • 节点离开(节点主机执行)

## 强制离开swarm集群  docker swarm leave--force
[root@node136 ~]# docker swarm leave
Node left the swarm.

## 此时节点node136 是down的。
[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n494afsdjzs74q5y5vb4xlgd4    node136   Down    Active        
v2tjxinr9jxfg52evpswn4yb6 *  node146   Ready   Active        Leader

## manager节点删除掉这个废弃的节点
[root@node146 ~]# docker node rm n494afsdjzs74q5y5vb4xlgd4
n494afsdjzs74q5y5vb4xlgd4

## 以manager身份重新加入
[root@node136 ~]# docker swarm join \
>     --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \
>     172.16.1.146:2377
This node joined a swarm as a manager.

[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n8dgcax0vcqmsjtc0aosx9k2q    node136   Ready   Active        Reachable
v2tjxinr9jxfg52evpswn4yb6 *  node146   Ready   Active        Leader


节点降级

节点从manager降级到worker

docker node demote v2tjxinr9jxfg52evpswn4yb6

[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n8dgcax0vcqmsjtc0aosx9k2q    node136   Ready   Active        Leader
v2tjxinr9jxfg52evpswn4yb6    node146   Down    Active        Unreachable
yvjirlxwpgvjohi3iagtzzkh2 *  node146   Ready   Active        Reachable

[root@node146 ~]# docker node demote v2tjxinr9jxfg52evpswn4yb6
Manager v2tjxinr9jxfg52evpswn4yb6 demoted in the swarm.

[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n8dgcax0vcqmsjtc0aosx9k2q    node136   Ready   Active        Leader
v2tjxinr9jxfg52evpswn4yb6    node146   Down    Active        
yvjirlxwpgvjohi3iagtzzkh2 *  node146   Ready   Active        Reachable

节点升级

  • 节点从worker升级到manager
  • docker node promote c9kynm13tvcf1vfrt0m6y7pbi
[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        
n8dgcax0vcqmsjtc0aosx9k2q    node136   Ready   Active        
yvjirlxwpgvjohi3iagtzzkh2 *  node146   Ready   Active        Leader

[root@node146 ~]# docker node promote c9kynm13tvcf1vfrt0m6y7pbi
Node c9kynm13tvcf1vfrt0m6y7pbi promoted to a manager in the swarm.

[root@node146 ~]# docker node promote n8dgcax0vcqmsjtc0aosx9k2q
Node n8dgcax0vcqmsjtc0aosx9k2q promoted to a manager in the swarm.

[root@node146 ~]# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi    node135   Ready   Active        Reachable
n8dgcax0vcqmsjtc0aosx9k2q    node136   Ready   Active        Reachable
yvjirlxwpgvjohi3iagtzzkh2 *  node146   Ready   Active        Leader

swarm 日常操作

命令参考

[root@node191 docker]# docker service --help

Usage:  docker service COMMAND

Manage services

Options:


Commands:
  create      Create a new service
  inspect     Display detailed information on one or more services
  logs        Fetch the logs of a service or task
  ls          List services
  ps          List the tasks of one or more services
  rm          Remove one or more services
  rollback    Revert changes to a service's configuration
  scale       Scale one or multiple replicated services
  update      Update a service

Run 'docker service COMMAND --help' for more information on a command.

官方文档:

https://docs.docker.com/engine/reference/commandline/service/

创建服务

docker service create --name nginx-service  --replicas=3  --publish 8080:8080 nginx:latest

如果仓库是私有仓库,记得增加–with-registry-auth 这个参数,否则其他节点无法拉取镜像,例如:

docker login 172.16.1.146 -p ***** -u admin; docker service create --with-registry-auth --name tomcat-logs-test --replicas=2 --publish 10080:8080 172.16.1.146/wondertek/docker-test:1.0.0-2018091910

查看服务信息

docker service ps docker-test

服务扩容

docker service scale  docker-test=3

label 定义

  • 约束可以匹配节点或docker engine的labels,如下:
节点属性 匹配 示例
node.id 节点ID node.id == 2ivku8v2gvtg4
node.hostname 节点主机名 node.hostname != node-2
node.role 节点角色:manager node.role == manager
node.labels 用户定义节点labels node.labels.security == high
engine.labels Docker Engine的labels engine.labels.operatingsystem == ubuntu 14.04
  • engine.labels匹配docker engine的lables,如操作系统,驱动等。集群管理员通过使用docker node update命令来添加node.labels以更好使用节点。

  • 添加标签

docker node update --label-add type=manager node146

[root@node146 ~]# docker node inspect node146 --pretty
ID:                     v2tjxinr9jxfg52evpswn4yb6
Labels:
 - type = manager
Hostname:               node146
Joined at:              2018-07-16 06:26:49.516457267 +0000 utc
Status:
 State:                 Ready
 Availability:          Active
 Address:               127.0.0.1
Manager Status:
 Address:               172.16.1.146:2377
 Raft Status:           Reachable
 Leader:                Yes
Platform:
 Operating System:      linux
 Architecture:          x86_64
Resources:
 CPUs:                  8
 Memory:                9.765 GiB
Plugins:
  Network:              bridge, host, macvlan, null, overlay
  Volume:               local
Engine Version:         1.13.1
  • 删除标签
docker node update --label-rm type node146
  • 指定标签运行
docker service rm my_web
docker node update --label-add env=test node135
docker node update --label-add env=prod node136

docker service create \
      --constraint node.labels.env==test \
      --replicas 3 \
      --name my_web2 \
      --publish 8080:80 \
      httpd


[root@node146 ~]# docker service ps my_web2
ID            NAME       IMAGE         NODE     DESIRED STATE  CURRENT STATE          ERROR  PORTS
lzle9hto7mk0  my_web2.1  httpd:latest  node135  Running        Running 4 seconds ago         
j9ujd6mcs2ex  my_web2.2  httpd:latest  node135  Running        Running 5 seconds ago         
lqc4apjhonen  my_web2.3  httpd:latest  node135  Running        Running 3 seconds ago


[root@node146 ~]# docker service inspect my_web2 --pretty

ID:             m7s5ura6bmjg1nd60lfwn8voa
Name:           my_web2
Service Mode:   Replicated
 Replicas:      3
Placement:Contraints:   [node.labels.env==test]
UpdateConfig:
 Parallelism:   1
 On failure:    pause
 Max failure ratio: 0
ContainerSpec:
 Image:         httpd:latest@sha256:2edbf09d0dbdf2a3e21e4cb52f3385ad916c01dc2528868bc3499111cc54e937
Resources:
Endpoint Mode:  vip
Ports:
 PublishedPort 8080
  Protocol = tcp
  TargetPort = 80 

删除服务

docker service rm docker-test

自定义overlay网络

[root@node135 ~]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
4888eb34115b        bridge              bridge              local
5dda44146214        docker_gwbridge     bridge              local
4dda8692018b        host                host                local
mumblsrh5oe4        ingress             overlay             swarm
1fcd0ef0748f        none                null                local



docker network create --driver overlay --subnet 10.22.1.0/24 swarm_net
  • 使用自定义网络创建服务
docker service create --name my_web --replicas=3 --network swarm_net httpd

docker service create --name util --network swarm_net busybox sleep 10000000
  • 同一overlay网络网络测试
docker exec util.1.muu3o4906mihbp1v8r3ejh80p nslookup tasks.my_web

docker exec  util.1.muu3o4906mihbp1v8r3ejh80p ping -c 3 my_web

服务升级

docker service update --image httpd:2.2.32 my_web
  • Swarm 可以在 service 创建或运行过程中灵活地通过 --replicas 调整容器副本的数量,内部调度器则会根据当前集群的资源使用状况在不同 node 上启停容器,这就是 service 默认的 replicated mode。
  • 在此模式下,node 上运行的副本数有多有少,一般情况下,资源更丰富的 node 运行的副本数更多,反之亦然。
  • 除了 replicated mode,service 还提供了一个 globalmode,其作用是强制在每个 node 上都运行一个且最多一个副本。
## global mode
docker service create \
       --mode global \
       --name logspout \
       --mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock \
       gliderlabs/logspout
  • service 增加到六个副本,每次更新两个副本,间隔时间一分半钟。
docker service update --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web

## 指定新的镜像
docker service update --image httpd:2.2.32 --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web
  • 查看服务升级过程
[root@node146 ~]# docker service ps my_web
ID            NAME          IMAGE         NODE     DESIRED STATE  CURRENT STATE                ERROR  PORTS
ku14zmzkpo9a  my_web.1      httpd:2.2.32  node135  Running        Running about a minute ago          
qh6pzjb6syt0   \_ my_web.1  httpd:latest  node135  Shutdown       Shutdown about a minute ago         
0muer26mxx1d  my_web.2      httpd:latest  node136  Running        Running 22 hours ago                
k8ybfbc6j20y  my_web.3      httpd:2.2.32  node146  Running        Running about a minute ago          
xr0adp42t7tm   \_ my_web.3  httpd:latest  node146  Shutdown       Shutdown about a minute ago         
acd06qrmmnrr  my_web.4      httpd:2.2.32  node135  Running        Running about a minute ago          
jae5i5lhlnb2  my_web.5      httpd:2.2.32  node146  Running        Running about a minute ago          
3zk4i1drb1nk  my_web.6      httpd:2.2.32  node136  Running        Running about a minute ago
  • 删除并添加新的 constraint,设置 node.labels.env==prod
docker service update --constraint-rm node.labels.env==test my_web2
docker service update --constraint-add node.labels.env==prod my_web2

回滚

  • 回滚到上一次操作,只能回滚一次。
docker service update --rollback my_web
  • 再次回滚,就是重复刚刚的升级操作。

健康检查

  • 对于提供 HTTP 服务接口的应用,常用的 Health Check 是通过 curl 检查 HTTP 状态码,比如:
    curl --fail http://localhost:8080/ || exit 1
  • 如果 curl 命令检测到任何一个错误的 HTTP 状态码,则返回 1,Health Check 失败。
docker service create --name my_web3 \
     --health-cmd "curl --fail http://localhost:8091 || exit 1" \
      httpd
  • --health-cmd Health Check 的命令,还有几个相关的参数:

  • 1. --timeout 命令超时的时间,默认 30s。

  • 2. --interval 命令执行的间隔时间,默认 30s。

  • 3. --retries 命令失败重试的次数,默认为 3,如果 3 次都失败了则会将容器标记为 unhealthy。swarm 会销毁并重建 unhealthy 的副本。

  • 查看健康检查信息

docker inspect b671e3100133

 "Health": {
                "Status": "unhealthy",
                "FailingStreak": 3,
                "Log": [
                    {
                        "Start": "2018-07-18T14:40:18.941056152+08:00",
                        "End": "2018-07-18T14:40:19.027466281+08:00",
                        "ExitCode": 1,
                        "Output": "/bin/sh: 1: curl: not found\n"
                    },
                    {
                        "Start": "2018-07-18T14:40:49.027620925+08:00",
                        "End": "2018-07-18T14:40:49.076160261+08:00",
                        "ExitCode": 1,
                        "Output": "/bin/sh: 1: curl: not found\n"
                    },
                    {
                        "Start": "2018-07-18T14:41:19.076291897+08:00",
                        "End": "2018-07-18T14:41:19.124894642+08:00",
                        "ExitCode": 1,
                        "Output": "/bin/sh: 1: curl: not found\n"
                    }
                ]
            }

猜你喜欢

转载自blog.csdn.net/qq_30062125/article/details/82772167