使用docker创建swarm集群网络

Docker集群网络，解决的问题是能同时响应多少请求。不是分布式计算，因为分布式计算是将一个任务拆分若干个子任务，然后将子任务分配到不同的机器上去执行。

集群网络的命令

（1）docker swarm管理集群

初始化集群：docker swarm init
以node（worker）或manager加入集群：docker swarm join
管理join-token：docker swarm join-token
更新集群：docker swarm update
退出集群：docker swarm leave

(2)docker node管理节点

提升节点为manager节点：docker node promote
把集群中指定的manager节点降权：docker node demote
显示节点信息：docker node inspect
更新节点属性：docker node update
显示正运行的节点：docker node ps
显示集群的全部节点：docker node ls
从集群中删除指定节点：docker node rm

(3)docker service管理服务

创建service：docker service create
取得service的详细信息：docker service inspect
取得service的任务信息： docker service ps
取得service的列表信息：docker service ls
删除 service：docker service rm
调整service的replicas：docker service scale
更新service：docker service update

第一步：安装Docker Machine

/home/wong# sudo curl -L https://github.com/docker/machine/releases/download/v0.8.2/docker-machine-`uname -s`-`uname -m` > /usr/local/bin/docker-machine && chmod a+x /usr/local/bin/docker-machine

第二步：首先使用Docker-machine创建一个虚拟机作为manager节点

~$ sudo docker-machine create --driver virtualbox manager1
Running pre-create checks...
(manager1) No default Boot2Docker ISO found locally, downloading the latest release...
(manager1) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(manager1) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(manager1) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(manager1) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(manager1) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(manager1) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(manager1) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(manager1) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(manager1) Copying /home/wong/.docker/machine/cache/boot2docker.iso to /home/wong/.docker/machine/machines/manager1/boot2docker.iso...
(manager1) Creating VirtualBox VM...
(manager1) Creating SSH key...
(manager1) Starting the VM...
(manager1) Check network to re-create if needed...
(manager1) Found a new host-only adapter: "vboxnet0"
(manager1) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env manager1

查看虚拟机的环境变量等信息，包括虚拟机的IP地址：

  ~$ sudo docker-machine env manager1
    [sudo] password for wong: 
    export DOCKER_TLS_VERIFY="1"
    export DOCKER_HOST="tcp://192.168.99.100:2376"
    export DOCKER_CERT_PATH="/home/wong/.docker/machine/machines/manager1"
    export DOCKER_MACHINE_NAME="manager1"
    # Run this command to configure your shell: 
    # eval $(docker-machine env manager1)

第三步：再创建一个worker1节点

~$ sudo docker-machine create --driver virtualbox worker1
Running pre-create checks...
(worker1) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker1) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker1) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker1) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker1) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(worker1) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker1) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker1) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker1) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker1) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(worker1) Copying /home/wong/.docker/machine/cache/boot2docker.iso to /home/wong/.docker/machine/machines/worker1/boot2docker.iso...
(worker1) Creating VirtualBox VM...
(worker1) Creating SSH key...
(worker1) Starting the VM...
(worker1) Check network to re-create if needed...
(worker1) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env worker1

查看虚拟机worker1的情况

~$ sudo docker-machine env worker1
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.101:2376"
export DOCKER_CERT_PATH="/home/wong/.docker/machine/machines/worker1"
export DOCKER_MACHINE_NAME="worker1"
# Run this command to configure your shell: 
# eval $(docker-machine env worker1)

目前已有两台虚拟机：

~$ sudo docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER     ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.100:2376           v18.09.0   
worker1    -        virtualbox   Running   tcp://192.168.99.101:2376           v18.09.0

第四步：把manager1加入集群
因为我们使用的是Docker Machine创建的虚拟机，所以可以使用docker-machine ssh命令来操作虚拟机。

~$ sudo docker-machine ssh manager1 docker swarm init --listen-addr 192.168.99.100:2377 --advertise-addr 192.168.99.100
Swarm initialized: current node (yofejour79ap6z97craaufb0k) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

注： --listen-addr 192.168.99.100:2377指定监听的IP及端口,实际的swarm命令如下：
docker swarm init --listen-addr :

如果在新建集群时，遇到双网卡，可以指定使用哪个IP，如：

~$sudo docker-machine ssh manager1 docker swarm init –listen-addr $MANAGER1_IP:2377
Error response from daemon:could not choose an IP address to advertise since this system has multiple address on different interfaces (10.0.2.15 on eth0 and 192.168.99.100 on eth1) -specify on with –advertise-addr 
exit status

发生这样的错误，是因为系统系双网卡，有两个IP地址，swarm不知道用哪一个好，因此要指定，如：

~$sudo docker-machine ssh manager1 docker swarm init --advertise-addr 192.168.99.100 --listen-addr 192.168.99.100:2377

注：–advertise-addr参数表示其它swarm中的worker节点使用此ip地址与manager联系。

第五步：将worker1加入集群中：

~$ sudo docker-machine ssh worker1 docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377
This node joined a swarm as a worker.

上面这条指令可以添加—listen-addr $WORKER1_IP:2377作为监听准备，因为有时候可能会遇到把一个work节点提升为一个manager节点的可能,即：

~$ sudo docker-machine ssh worker1 docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377  --listen-addr 192.168.99.101:2377

经过上面五步，集群初始化成功。目前已经新建了一个有两个节点的“集群”,然后进入其中一个管理节点使用docker node命令来查看节点信息：

~$ sudo docker-machine ssh manager1 docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
yofejour79ap6z97craaufb0k *   manager1            Ready               Active              Leader              18.09.0
b4fc8xt8dgemi44c7boy9gkgg     worker1             Ready               Active                                  18.09.0

上面这两个节点都归属swarm，并都处于待机状态，manager1是领导者，worker1是工人。
第六步：我们再继续创建manager2、worker2、worker3:

~$ sudo docker-machine create --driver virtualbox manager2
Running pre-create checks...
(manager2) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(manager2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(manager2) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(manager2) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(manager2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(manager2) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(manager2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(manager2) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(manager2) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(manager2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(manager2) Copying /home/wong/.docker/machine/cache/boot2docker.iso to /home/wong/.docker/machine/machines/manager2/boot2docker.iso...
(manager2) Creating VirtualBox VM...
(manager2) Creating SSH key...
(manager2) Starting the VM...
(manager2) Check network to re-create if needed...
(manager2) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env manager2


~$ sudo docker-machine create --driver virtualbox worker2
Running pre-create checks...
(worker2) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker2) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker2) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(worker2) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker2) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker2) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(worker2) Copying /home/wong/.docker/machine/cache/boot2docker.iso to /home/wong/.docker/machine/machines/worker2/boot2docker.iso...
(worker2) Creating VirtualBox VM...
(worker2) Creating SSH key...
(worker2) Starting the VM...
(worker2) Check network to re-create if needed...
(worker2) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env worker2


~$ sudo docker-machine create --driver virtualbox worker3
Running pre-create checks...
(worker3) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker3) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker3) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker3) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker3) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(worker3) Unable to get the local Boot2Docker ISO version:  Did not find prefix "-v" in version string
(worker3) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(worker3) Latest release for github.com/boot2docker/boot2docker is v18.09.0
(worker3) Downloading /home/wong/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v18.09.0/boot2docker.iso...
(worker3) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(worker3) Copying /home/wong/.docker/machine/cache/boot2docker.iso to /home/wong/.docker/machine/machines/worker3/boot2docker.iso...
(worker3) Creating VirtualBox VM...
(worker3) Creating SSH key...
(worker3) Starting the VM...
(worker3) Check network to re-create if needed...
(worker3) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env worker3

查看目前有的虚拟机：

~$ sudo docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER     ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.100:2376           v18.09.0   
manager2   -        virtualbox   Running   tcp://192.168.99.102:2376           v18.09.0   
worker1    -        virtualbox   Running   tcp://192.168.99.101:2376           v18.09.0   
worker2    -        virtualbox   Running   tcp://192.168.99.104:2376           v18.09.0   
worker3    -        virtualbox   Running   tcp://192.168.99.103:2376           v18.09.0

第七步：先从manager1中获取worker的token

~$ sudo docker-machine ssh manager1 docker swarm join-token worker
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377

第八步：添加worker2到集群中。

~$ sudo docker-machine ssh worker2 docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377
This node joined a swarm as a worker.

第九步：添加worker3到集群中。

~$ sudo docker-machine ssh worker3 docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-97trmmecjm1goy60cilmn6e8b 192.168.99.100:2377
This node joined a swarm as a worker.

第十步：先从manager1中获取manager的token：

~$ sudo docker-machine ssh manager1 docker swarm join-token manager
[sudo] password for wong: 
To add a manager to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-daisu4r8kzc611zvfsmrpfyca 192.168.99.100:2377

第十一步：添加manager2到集群中。

~$ sudo docker-machine ssh manager2 docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-daisu4r8kzc611zvfsmrpfyca 192.168.99.100:2377
This node joined a swarm as a manager.

第十二步：查看集群信息。

~$ sudo docker-machine ssh manager2 docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
yofejour79ap6z97craaufb0k     manager1            Ready               Active              Leader              18.09.0
qj4j1ig9q47ekd36qtdxpgjm7 *   manager2            Ready               Active              Reachable           18.09.0
b4fc8xt8dgemi44c7boy9gkgg     worker1             Ready               Active                                  18.09.0
fca0nly3kgad1qy9ja11i3jvg     worker2             Ready               Active                                  18.09.0
9va6se2x63m8pfpeimlq3ozgu     worker3             Ready               Active                                  18.09.0

第十三步：建立跨主机网络，下面把宿主机也加入到集群之中(在我的实验中不成功)：

docker swarm join --token SWMTKN-1-5xp96pa27nowk0ms1hxrmyf1xhmo6avtmveu994dd1gm243qku-daisu4r8kzc611zvfsmrpfyca 192.168.99.100:2377

在管理节点上查看网络，如manager1：

~$ sudo docker-machine ssh manager1  docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
01fd514b0148        bridge              bridge              local
879e49badc74        docker_gwbridge     bridge              local
03a7caaa3c7a        host                host                local
uu8f6vano4p8        ingress             overlay             swarm     
699dfd1a2172        none                null                local

注：swarm上默认会创建一个名为ingress 的overlay网络。

第十四步：在管理节点上创建一个新的overlay网络

~$sudo docker-machine ssh manager1 docker network create --driver overlay swarm_test

第十五步：在管理节点上查看网络，如manager1：

~$ sudo docker-machine ssh manager1  docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
01fd514b0148        bridge              bridge              local
879e49badc74        docker_gwbridge     bridge              local
03a7caaa3c7a        host                host                local
uu8f6vano4p8        ingress             overlay             swarm
699dfd1a2172        none                null                local
bq4sgj6lx14z        swarm_test          overlay             swarm

第十六步：在跨主机网络上部署应用，首先在5个节点上使用docker pull拉取nginx镜像：

~$ sudo docker-machine ssh manager1 docker pull nginx:alpine
~$ sudo docker-machine ssh manager2 docker pull nginx:alpine
~$ sudo docker-machine ssh worker1 docker pull nginx:alpine
~$ sudo docker-machine ssh worker2 docker pull nginx:alpine
~$ sudo docker-machine ssh worker3 docker pull nginx:alpine

第十七步：在五个节点上部署一组Nginx服务，部署用的服务使用swarm_test跨主机网络。

~$ sudo docker-machine ssh manager1 docker service create --replicas 2 --name HelloWorld123 --network=swarm_test nginx:alpine

注：–replicas 2是将服务复制到多少台服务器上。目前是复制到2台上。指定启动的服务由几个实例组成。

第十八步：查看服务状态：

~$ sudo docker-machine ssh manager1 docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
0fr2qyvormwo        HelloWorld123       replicated          2/2                 nginx:alpine        
h4qhr4sk93rf        helloworld          replicated          2/2                 nginx:latest  
第十九步：查看HelloWorld123 服务详情：
~$ sudo docker-machine ssh manager1 docker service ps HelloWorld123
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
wnocc7pnx4x2        HelloWorld123.1     nginx:alpine        worker2             Running             Running about a minute ago                       
k725ausmndqh        HelloWorld123.2     nginx:alpine        worker1             Running             Running about a minute ago

第二十步：进入两个节点，查看服务状态：

~$ sudo docker-machine ssh worker1 docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
0c729a91449a        nginx:alpine        "nginx -g 'daemon of…"   2 minutes ago       Up 2 minutes        80/tcp              HelloWorld123.2.k725ausmndqhrrkrf532r96nq

~$ sudo docker-machine ssh worker2 docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
128d3acd55fc        nginx:alpine        "nginx -g 'daemon of…"   About a minute ago   Up About a minute   80/tcp              HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0

第二十一步：首先使用machine 进入worker1节点，然后使用docker exec -i命令进入HelloWorld123.2.k725ausmndqhrrkrf532r96nq容器中ping运行在worker2节点的的HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0容器：

  ~$ sudo docker-machine ssh worker1 docker exec -i  HelloWorld123.2.k725ausmndqhrrkrf532r96nq ping HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0
    
    PING HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0 (10.0.0.8): 56 data bytes
    64 bytes from 10.0.0.8: seq=0 ttl=64 time=0.317 ms
    64 bytes from 10.0.0.8: seq=1 ttl=64 time=1.061 ms
    64 bytes from 10.0.0.8: seq=2 ttl=64 time=0.982 ms
    64 bytes from 10.0.0.8: seq=3 ttl=64 time=1.118 ms

第二十二步：使用machine 进入worker2节点，然后使用docker exec -i命令进入HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0容器中ping运行在worker2节点的HelloWorld123.2.k725ausmndqhrrkrf532r96nq容器：

~$ sudo docker-machine ssh worker2 docker exec -i HelloWorld123.1.wnocc7pnx4x27troemvi2i5a0 ping HelloWorld123.2.k725ausmndqhrrkrf532r96nq
PING HelloWorld123.2.k725ausmndqhrrkrf532r96nq (10.0.0.9): 56 data bytes
64 bytes from 10.0.0.9: seq=0 ttl=64 time=0.367 ms
64 bytes from 10.0.0.9: seq=1 ttl=64 time=0.799 ms
64 bytes from 10.0.0.9: seq=2 ttl=64 time=0.782 ms
64 bytes from 10.0.0.9: seq=3 ttl=64 time=1.152 ms
64 bytes from 10.0.0.9: seq=4 ttl=64 time=1.098 ms

第二十三步：集群做完了，下面来看看swarm集群负载：
首先删除前面的服务：

~$ sudo docker-machine ssh manager1 docker service rm  HelloWorld123

再创建新新服务：

~$ sudo docker-machine ssh manager1 docker service create --replicas 2 --name helloworld -p 7080:80 --network=swarm_test nginx:alpine

第二十四步：查看服务运行状态：

~$ sudo docker-machine ssh manager1 docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
edj1isbp6p90        helloworld          replicated          2/2                 nginx:alpine        *:7080->80/tcp

第二十五步：查看 helloworld 服务详情：

~$ sudo docker-machine ssh manager1 docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
il4h1sfr3c7d        helloworld.1        nginx:alpine        manager1            Running             Running 4 minutes ago                       
tmvwemy8l6y9        helloworld.2        nginx:alpine        worker1             Running             Running 4 minutes ago

Docker（1.12后）已经内置了服务发现工具，对于一个容器来说，如果没有外部通信但是又是运行中的状态，会被服务发现工具认为是preparing状态，刚刚映射了端口，因此有了Running状态。

一个有趣的swarm实例

（1）首先看看两个实例

 ~$ sudo docker-machine ssh manager1 docker ps -a
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
    d4c9824ff00f        nginx:alpine        "nginx -g 'daemon of…"   9 minutes ago       Up 9 minutes        80/tcp              helloworld.1.il4h1sfr3c7d8pee396iqlsxq
    ~$ sudo docker-machine ssh worker1 docker ps -a
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
    a697c1d0385a        nginx:alpine        "nginx -g 'daemon of…"   22 minutes ago      Up 22 minutes       80/tcp              helloworld.2.tmvwemy8l6y98hqovnv2zxwh9

（2）kill掉worker1的容器实例：

~$ sudo docker-machine ssh worker1 docker kill helloworld.2.tmvwemy8l6y98hqovnv2zxwh9

（3）稍等几秒，再看看服务状态

~$ sudo docker-machine ssh manager1 docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE                ERROR                         PORTS
il4h1sfr3c7d        helloworld.1        nginx:alpine        manager1            Running             Running 26 minutes ago                                     
s5kmzvoa031k        helloworld.2        nginx:alpine        worker1             Running             Running about a minute ago                                 
tmvwemy8l6y9         \_ helloworld.2    nginx:alpine        worker1             Shutdown            Failed about a minute ago    "task: non-zero exit (137)" 

~$ sudo docker-machine ssh manager1 docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
edj1isbp6p90        helloworld          replicated          2/2                 nginx:alpine        *:7080->80/tcp

可以看到即使kill掉其中一个实例，swarm也会迅速把停止的容器撤下来，同时在节点中启动一个新的实例顶上来。

如果想添加更多实例，可以使用scale命令,登录到manager节点，使用命令docker service scale =来将服务扩展到指定的实例数：

~$ sudo docker-machine ssh manager1 docker service scale helloworld=3

设置后，查看服务详情，可以看到有3个实例启动了：

~$ sudo docker-machine ssh manager1 docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR                         PORTS
il4h1sfr3c7d        helloworld.1        nginx:alpine        manager1            Running             Running 31 minutes ago                                 
s5kmzvoa031k        helloworld.2        nginx:alpine        worker1             Running             Running 6 minutes ago                                  
tmvwemy8l6y9         \_ helloworld.2    nginx:alpine        worker1             Shutdown            Failed 7 minutes ago     "task: non-zero exit (137)"   
f29be0jidu8a        helloworld.3        nginx:alpine        manager2            Running             Running 14 seconds ago

如果想减少实例数量，也可以通过scale命令：

~$ sudo docker-machine ssh manager1 docker service scale helloworld=2

第三方的集群管理
Kubernetes、Mesos、swarm集群管理三巨头。

异常处理

(1)当swarm被不正常关闭了。我们查看docker-machine列表报错。

~$ sudo docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER     ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.100:2376           v18.09.0   
manager2   -        virtualbox   Running   tcp://192.168.99.101:2376           Unknown    Unable to query docker version: Get https://192.168.99.101:2376/v1.15/version: x509: certificate is valid for 192.168.99.102, not 192.168.99.101
worker1    -        virtualbox   Running   tcp://192.168.99.102:2376           Unknown    Unable to query docker version: Get https://192.168.99.102:2376/v1.15/version: x509: certificate is valid for 192.168.99.101, not 192.168.99.102
worker2    -        virtualbox   Running   tcp://192.168.99.103:2376           Unknown    Unable to query docker version: Get https://192.168.99.103:2376/v1.15/version: x509: certificate is valid for 192.168.99.104, not 192.168.99.103
worker3    -        virtualbox   Running   tcp://192.168.99.104:2376           Unknown    Unable to query docker version: Get https://192.168.99.104:2376/v1.15/version: x509: certificate is valid for 192.168.99.103, not 192.168.99.104

(2)根据错误信息判断是网络配置问题，查看其env发现报同样错误，并给出提示解决方案：

~$ sudo docker-machine env manager2
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.101:2376": x509: certificate is valid for 192.168.99.102, not 192.168.99.101
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

解决方法：

~$ sudo docker-machine regenerate-certs manager2
~$ sudo docker-machine regenerate-certs worker1
~$ sudo docker-machine regenerate-certs worker2
~$ sudo docker-machine regenerate-certs worker3

查看docker-machine列表，就正常了：

~$ sudo docker-machine ls
NAME       ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER     ERRORS
manager1   -        virtualbox   Running   tcp://192.168.99.100:2376           v18.09.0   
manager2   -        virtualbox   Running   tcp://192.168.99.101:2376           v18.09.0   
worker1    -        virtualbox   Running   tcp://192.168.99.102:2376           v18.09.0   
worker2    -        virtualbox   Running   tcp://192.168.99.103:2376           v18.09.0   
worker3    -        virtualbox   Running   tcp://192.168.99.104:2376           v18.09.0

(3)通过管理节点 manager1，发现还是报错，提示集群没有管理节点。

~$ sudo docker-machine ssh manager1 docker service ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
exit status 1

一开始，打开算初始化一个管理节点，结果提示节点已在swarm集群中，根据提示得到解决：

~$ sudo docker-machine ssh manager1 docker swarm init --listen-addr 192.168.99.100:2377 --advertise-addr 192.168.99.100
Error response from daemon: This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one.
exit status 1

解决办法，先退出集群，后再加进集群：

~$ sudo docker-machine ssh manager1 docker swarm leave --force
Node left the swarm.



~$ sudo docker-machine ssh manager1 docker swarm init --listen-addr 192.168.99.100:2377 --advertise-addr 192.168.99.100
        Swarm initialized: current node (f2l9cuitvlidvwhn4ffsfiq5r) is now a manager.
    
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-ezeumdtoi27isfe234al6snmi 192.168.99.100:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

(4)查看集群节点，发现只有管节点manager1,其他的节点都没有：

~$ sudo docker-machine ssh manager1 docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
f2l9cuitvlidvwhn4ffsfiq5r *   manager1            Ready               Active              Leader              18.09.0

解决办法：将其他节点先退出，再加进入集群：

~$ sudo docker-machine ssh worker1 docker swarm leave --force
Node left the swarm.
~/Desktop$ sudo docker-machine ssh worker2 docker swarm leave --force
Node left the swarm.
~/Desktop$ sudo docker-machine ssh worker3 docker swarm leave --force
Node left the swarm.
~$ sudo docker-machine ssh manager2 docker swarm leave --force
Node left the swarm.

加入集群：

~$ sudo docker-machine ssh worker1 docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-ezeumdtoi27isfe234al6snmi 192.168.99.100:2377
This node joined a swarm as a worker.
~/Desktop$ sudo docker-machine ssh worker2 docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-ezeumdtoi27isfe234al6snmi 192.168.99.100:2377
This node joined a swarm as a worker.
~/Desktop$ sudo docker-machine ssh worker3 docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-ezeumdtoi27isfe234al6snmi 192.168.99.100:2377
This node joined a swarm as a worker.

再加入manager2节点前要先查管理节点的token,因为manager2也是一个管理节点：

~$ sudo docker-machine ssh manager1 docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-9024oqrjvi1wig34sss3oxj4h 192.168.99.100:2377

这下可以把manager2加入集群中了：

~$ sudo docker-machine ssh manager2 docker swarm join --token SWMTKN-1-64sqccoshqmwhoslb1qlfob4v1q9yxqvn7jpyud5eak7ll8k0e-9024oqrjvi1wig34sss3oxj4h 192.168.99.100:2377
This node joined a swarm as a manager.

（5）查看集群节点，全都在了。

~$ sudo docker-machine ssh manager1 docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
f2l9cuitvlidvwhn4ffsfiq5r *   manager1            Ready               Active              Leader              18.09.0
zbnaor33mz6v56pm2867vc7wq     manager2            Ready               Active              Reachable           18.09.0
9ty05ceu1s32bqak0u750v002     worker1             Ready               Active                                  18.09.0
6ibrcqh3gifufk51ktnprocgd     worker2             Ready               Active                                  18.09.0
w93ubyj72d3d25ss1jarg195x     worker3             Ready               Active                                  18.09.0

（6）再按第十六步操作，在每个台虚拟机上拉取镜像
（7）创建新 overlay网络。

~$ sudo docker-machine ssh manager1 docker network create --driver overlay swarm_test
r1k8f4ccgozif13j8nqotzp5w

（8）重新部署应用服务：

~$ sudo docker-machine ssh manager1 docker service create -p 8000:80 --replicas 5 --name HelloNginx --network=swarm_test nginx
runx1bwn0d1mcksiwrb9a7zkt
overall progress: 0 out of 5 tasks
1/5:  
2/5:  
3/5:  
4/5:  
5/5:  
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 0 out of 5 tasks
overall progress: 2 out of 5 tasks
overall progress: 3 out of 5 tasks
overall progress: 3 out of 5 tasks
overall progress: 3 out of 5 tasks
overall progress: 4 out of 5 tasks
overall progress: 5 out of 5 tasks
verify: Waiting 5 seconds to verify that tasks are stable...
verify: Waiting 5 seconds to verify that tasks are stable...
verify: Waiting 5 seconds to verify that tasks are stable...
verify: Waiting 5 seconds to verify that tasks are stable...
verify: Waiting 5 seconds to verify that tasks are stable...
verify: Waiting 4 seconds to verify that tasks are stable...
verify: Waiting 4 seconds to verify that tasks are stable...
verify: Waiting 4 seconds to verify that tasks are stable...
verify: Waiting 4 seconds to verify that tasks are stable...
verify: Waiting 4 seconds to verify that tasks are stable...
verify: Waiting 3 seconds to verify that tasks are stable...
verify: Waiting 3 seconds to verify that tasks are stable...
verify: Waiting 3 seconds to verify that tasks are stable...
verify: Waiting 3 seconds to verify that tasks are stable...
verify: Waiting 3 seconds to verify that tasks are stable...
verify: Waiting 2 seconds to verify that tasks are stable...
verify: Waiting 2 seconds to verify that tasks are stable...
verify: Waiting 2 seconds to verify that tasks are stable...
verify: Waiting 2 seconds to verify that tasks are stable...
verify: Waiting 2 seconds to verify that tasks are stable...
verify: Waiting 1 seconds to verify that tasks are stable...
verify: Waiting 1 seconds to verify that tasks are stable...
verify: Waiting 1 seconds to verify that tasks are stable...
verify: Waiting 1 seconds to verify that tasks are stable...
verify: Service converged

(8)验证一下我们的应用是否部署成功了：

~/Desktop$ sudo docker-machine ssh worker1
[sudo] password for wong: 
   ( '>')
  /) TC (\   Core is distributed with ABSOLUTELY NO WARRANTY.
 (/-_--_-\)           www.tinycorelinux.net

docker@worker1:~$ ping localhost:8080
PING localhost:8080 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.057 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.125 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.123 ms
^C
--- localhost:8080 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.057/0.101/0.125 ms

docker@worker1:~$ exit               
logout
wong@wong-HP-ProDesk-480-G2-MT:~/Desktop$ sudo docker-machine ssh worker2
   ( '>')
  /) TC (\   Core is distributed with ABSOLUTELY NO WARRANTY.
 (/-_--_-\)           www.tinycorelinux.net

docker@worker2:~$ ping localhost:8000
PING localhost:8000 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.076 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.135 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.115 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.128 ms
^C

--- localhost:8000 ping statistics ---
8 packets transmitted, 6 packets received, 25% packet loss
round-trip min/avg/max = 0.076/0.123/0.156 ms
docker@worker2:~$ ping localhost:80  
PING localhost:80 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.043 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.121 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.069 ms
^C
--- localhost:80 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.043/0.077/0.121 ms
docker@worker2:~$ exit
logout
~/Desktop$ sudo docker-machine ssh worker3
   ( '>')
  /) TC (\   Core is distributed with ABSOLUTELY NO WARRANTY.
 (/-_--_-\)           www.tinycorelinux.net

docker@worker3:~$ ping localhost:8000
PING localhost:8000 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.054 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.158 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.094 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.107 ms
^C
--- localhost:8000 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.054/0.108/0.158 ms
docker@worker3:~$ exit               
logout
wong@wong-HP-ProDesk-480-G2-MT:~/Desktop$ sudo docker-machine ssh manager1
   ( '>')
  /) TC (\   Core is distributed with ABSOLUTELY NO WARRANTY.
 (/-_--_-\)           www.tinycorelinux.net

docker@manager1:~$ ping localhost:8000
PING localhost:8000 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.084 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.115 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.090 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.088 ms
^C
--- localhost:8000 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.084/0.096/0.115 ms
docker@manager1:~$ exit               
logout
wong@wong-HP-ProDesk-480-G2-MT:~/Desktop$ sudo docker-machine ssh manager2
   ( '>')
  /) TC (\   Core is distributed with ABSOLUTELY NO WARRANTY.
 (/-_--_-\)           www.tinycorelinux.net

docker@manager2:~$ ping localhost:8000
PING localhost:8000 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.104 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.109 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.106 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.117 ms
^C
--- localhost:8000 ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 0.104/0.110/0.117 ms
docker@manager2:~$ exit
logout
exit status 127

成功了！

谢谢阅读。