Sistema de gerenciamento de tarefas Slurm baseado em Docker
Configurações do servidor Aliyun
Vídeo de referência: https://www.bilibili.com/video/BV177411K7bH
Etapa 1 - Inscreva-se no servidor Alibaba Cloud
Você pode se inscrever gratuitamente para um host Alibaba Cloud por um mês. Eu me inscrevi para um servidor em nuvem 2G de 1 núcleo por um mês com uma largura de banda de 4M e um disco de sistema de 40G. O sistema instalado é a versão CentOS 8.4 de 64 bits.
Passo 2 - Modifique a instância
Depois de entrar no servidor de nuvem ECS,
clique na instância em execução i-uf689okdsil887t0h11x e você verá que o endereço IP da rede pública do servidor será usado para login ssh. Em seguida, modifique o nome do
host da instância e redefina a senha da instância. Após a modificação, reinicie imediatamente tudo bem.
Etapa 3 - Abra o grupo de segurança e execute o mapeamento de portas
O servidor de nuvem adquirido no Alibaba Cloud precisa ativar a configuração do grupo de segurança, caso contrário, não poderá ser acessado de fora. Clique na
regra de configuração na barra de operação, insira o grupo de segurança e adicione o número da porta que você precisa abrir. O último exemplo usa a porta 8888. Por favor, certifique-se de abrir a
porta aberta padrão. Existem 22 (uso ssh subsequente). O número da porta que adicionei é como mostrado na imagem acima. Se necessário, você ainda pode entrar e adicioná-lo mais tarde .
Passo 4 - Use o xshell para se conectar remotamente
Acesse o site oficial para baixar e instalar o Xshell 7. Crie uma nova sessão, preencha seu próprio IP de rede pública do Alibaba Cloud e, em seguida, preencha o nome de usuário root e a senha que você acabou de definir no servidor para entrar no servidor. Se você vir Bem-vindo ao Alibaba Cloud Elastic Compute Service!, significa que você entrou no servidor
# 按照提示输入将命令行激活
[root@Iceland ~]# systemctl enable --now cockpit.socket
Created symlink /etc/systemd/system/sockets.target.wants/cockpit.socket → /usr/lib/systemd/system/cockpit.socket.
# check 服务器当前环境
[root@Iceland ~]# pwd
/root
[root@Iceland ~]# cd /
[root@Iceland /]# ls
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
[root@Iceland ~]# uname -r # 查看操作系统内核版本
4.18.0-305.3.1.el8.x86_64
[root@Iceland /]# cat /etc/os-release # 查看操作系统详细信息
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
Instalar o Docker no servidor
Documentação oficial do site: https://docs.docker.com/engine/install/centos/
Etapa 1 - Desinstale a versão antiga do docker
[root@Iceland /]# sudo yum remove docker \
> docker-client\
> docker-client-latest \
> docker-common \
> docker-latest \
> docker-latest-logrotate \
> docker-logrotate \
> docker-engine
No match for argument: docker
No match for argument: docker-clientdocker-client-latest
No match for argument: docker-common
No match for argument: docker-latest
No match for argument: docker-latest-logrotate
No match for argument: docker-logrotate
No match for argument: docker-engine
No packages marked for removal.
Dependencies resolved.
Nothing to do.
Complete! # 由于是新服务器所以并没有这些老版本 docker
Passo 2 - Instale o repositório espelho
[root@Iceland /]# yum install -y yum-utils # 安装 yum-utils
Last metadata expiration check: 2:09:26 ago on Sat 28 Aug 2021 06:38:17 PM CST.
Dependencies resolved.
=============================================================================================
Package Architecture Version Repository Size
=============================================================================================
Installing:
yum-utils noarch 4.0.18-4.el8 baseos 71 k
Transaction Summary
=============================================================================================
Install 1 Package
Total download size: 71 k
Installed size: 22 k
Downloading Packages:
yum-utils-4.0.18-4.el8.noarch.rpm 1.7 MB/s | 71 kB 00:00
---------------------------------------------------------------------------------------------
Total 1.6 MB/s | 71 kB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : yum-utils-4.0.18-4.el8.noarch 1/1
Running scriptlet: yum-utils-4.0.18-4.el8.noarch 1/1
Verifying : yum-utils-4.0.18-4.el8.noarch 1/1
Installed:
yum-utils-4.0.18-4.el8.noarch
Complete!
[root@Iceland /]# yum-config-manager \ # 建立稳定链接的仓库
> --add-repo \
> https://download.docker.com/linux/centos/docker-ce.repo
Adding repo from: https://download.docker.com/linux/centos/docker-ce.repo
# 国外仓库网站较慢后面会用阿里云的仓库
Etapa 3 - Instalar o mecanismo do docker
[root@Iceland /]# yum install docker-ce docker-ce-cli containerd.io # 安装这 3 个组件
Docker CE Stable - x86_64 18 kB/s | 15 kB 00:00
Dependencies resolved.
=====================================================================================================
Package Arch Version Repository Size
=====================================================================================================
Installing:
containerd.io x86_64 1.4.9-3.1.el8 docker-ce-stable 30 M
docker-ce x86_64 3:20.10.8-3.el8 docker-ce-stable 22 M
docker-ce-cli x86_64 1:20.10.8-3.el8 docker-ce-stable 29 M
Installing dependencies:
container-selinux noarch 2:2.164.1-1.module_el8.4.0+886+c9a8d9ad appstream 52 k
docker-ce-rootless-extras x86_64 20.10.8-3.el8 docker-ce-stable 4.6 M
docker-scan-plugin x86_64 0.8.0-3.el8 docker-ce-stable 4.2 M
fuse-common x86_64 3.2.1-12.el8 baseos 21 k
fuse-overlayfs x86_64 1.6-1.module_el8.4.0+886+c9a8d9ad appstream 73 k
fuse3 x86_64 3.2.1-12.el8 baseos 50 k
fuse3-libs x86_64 3.2.1-12.el8 baseos 94 k
libcgroup x86_64 0.41-19.el8 baseos 70 k
libslirp x86_64 4.3.1-1.module_el8.4.0+575+63b40ad7 appstream 69 k
slirp4netns x86_64 1.1.8-1.module_el8.4.0+641+6116a774 appstream 51 k
Enabling module streams:
container-tools rhel8
Transaction Summary
=====================================================================================================
Install 13 Packages
Total download size: 90 M
Installed size: 377 M
Is this ok [y/N]: y # 中间等待输入 y 即可
Downloading Packages:
(1/13): container-selinux-2.164.1-1.module_el8.4.0+886+c9a8d9ad.noar 1.4 MB/s | 52 kB 00:00
(2/13): fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64.rpm 1.9 MB/s | 73 kB 00:00
(3/13): libslirp-4.3.1-1.module_el8.4.0+575+63b40ad7.x86_64.rpm 1.3 MB/s | 69 kB 00:00
(4/13): fuse-common-3.2.1-12.el8.x86_64.rpm 1.4 MB/s | 21 kB 00:00
(5/13): slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64.rpm 2.7 MB/s | 51 kB 00:00
(6/13): fuse3-3.2.1-12.el8.x86_64.rpm 4.0 MB/s | 50 kB 00:00
(7/13): libcgroup-0.41-19.el8.x86_64.rpm 4.6 MB/s | 70 kB 00:00
(8/13): fuse3-libs-3.2.1-12.el8.x86_64.rpm 4.7 MB/s | 94 kB 00:00
(9/13): docker-ce-20.10.8-3.el8.x86_64.rpm 5.5 MB/s | 22 MB 00:03
(10/13): docker-ce-rootless-extras-20.10.8-3.el8.x86_64.rpm 3.5 MB/s | 4.6 MB 00:01
(11/13): containerd.io-1.4.9-3.1.el8.x86_64.rpm 4.7 MB/s | 30 MB 00:06
(12/13): docker-scan-plugin-0.8.0-3.el8.x86_64.rpm 3.5 MB/s | 4.2 MB 00:01
(13/13): docker-ce-cli-20.10.8-3.el8.x86_64.rpm 3.6 MB/s | 29 MB 00:08
-----------------------------------------------------------------------------------------------------
Total 11 MB/s | 90 MB 00:08
warning: /var/cache/dnf/docker-ce-stable-fa9dc42ab4cec2f4/packages/containerd.io-1.4.9-3.1.el8.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 621e9f35: NOKEY
Docker CE Stable - x86_64 3.1 kB/s | 1.6 kB 00:00
Importing GPG key 0x621E9F35:
Userid : "Docker Release (CE rpm) <[email protected]>"
Fingerprint: 060A 61C5 1B55 8A7F 742B 77AA C52F EB6B 621E 9F35
From : https://download.docker.com/linux/centos/gpg
Is this ok [y/N]: y # 中间等待输入 y 即可
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : docker-scan-plugin-0.8.0-3.el8.x86_64 1/13
Running scriptlet: docker-scan-plugin-0.8.0-3.el8.x86_64 1/13
Installing : docker-ce-cli-1:20.10.8-3.el8.x86_64 2/13
Running scriptlet: docker-ce-cli-1:20.10.8-3.el8.x86_64 2/13
Running scriptlet: container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch 3/13
Installing : container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch 3/13
Running scriptlet: container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch 3/13
Installing : containerd.io-1.4.9-3.1.el8.x86_64 4/13
Running scriptlet: containerd.io-1.4.9-3.1.el8.x86_64 4/13
Running scriptlet: libcgroup-0.41-19.el8.x86_64 5/13
Installing : libcgroup-0.41-19.el8.x86_64 5/13
Running scriptlet: libcgroup-0.41-19.el8.x86_64 5/13
Installing : fuse3-libs-3.2.1-12.el8.x86_64 6/13
Running scriptlet: fuse3-libs-3.2.1-12.el8.x86_64 6/13
Installing : fuse-common-3.2.1-12.el8.x86_64 7/13
Installing : fuse3-3.2.1-12.el8.x86_64 8/13
Installing : fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64 9/13
Running scriptlet: fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64 9/13
Installing : libslirp-4.3.1-1.module_el8.4.0+575+63b40ad7.x86_64 10/13
Installing : slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64 11/13
Installing : docker-ce-rootless-extras-20.10.8-3.el8.x86_64 12/13
Running scriptlet: docker-ce-rootless-extras-20.10.8-3.el8.x86_64 12/13
Installing : docker-ce-3:20.10.8-3.el8.x86_64 13/13
Running scriptlet: docker-ce-3:20.10.8-3.el8.x86_64 13/13
Running scriptlet: container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch 13/13
Running scriptlet: docker-ce-3:20.10.8-3.el8.x86_64 13/13
Verifying : container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch 1/13
Verifying : fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64 2/13
Verifying : libslirp-4.3.1-1.module_el8.4.0+575+63b40ad7.x86_64 3/13
Verifying : slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64 4/13
Verifying : fuse-common-3.2.1-12.el8.x86_64 5/13
Verifying : fuse3-3.2.1-12.el8.x86_64 6/13
Verifying : fuse3-libs-3.2.1-12.el8.x86_64 7/13
Verifying : libcgroup-0.41-19.el8.x86_64 8/13
Verifying : containerd.io-1.4.9-3.1.el8.x86_64 9/13
Verifying : docker-ce-3:20.10.8-3.el8.x86_64 10/13
Verifying : docker-ce-cli-1:20.10.8-3.el8.x86_64 11/13
Verifying : docker-ce-rootless-extras-20.10.8-3.el8.x86_64 12/13
Verifying : docker-scan-plugin-0.8.0-3.el8.x86_64 13/13
Installed:
container-selinux-2:2.164.1-1.module_el8.4.0+886+c9a8d9ad.noarch
containerd.io-1.4.9-3.1.el8.x86_64
docker-ce-3:20.10.8-3.el8.x86_64
docker-ce-cli-1:20.10.8-3.el8.x86_64
docker-ce-rootless-extras-20.10.8-3.el8.x86_64
docker-scan-plugin-0.8.0-3.el8.x86_64
fuse-common-3.2.1-12.el8.x86_64
fuse-overlayfs-1.6-1.module_el8.4.0+886+c9a8d9ad.x86_64
fuse3-3.2.1-12.el8.x86_64
fuse3-libs-3.2.1-12.el8.x86_64
libcgroup-0.41-19.el8.x86_64
libslirp-4.3.1-1.module_el8.4.0+575+63b40ad7.x86_64
slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64
Complete!
Embora este comando já tenha o docker instalado, ele não foi iniciado (o mesmo que o servidor, deve ser iniciado antes de ser executado).
Etapa 4 - Inicie a janela de encaixe e verifique
[root@Iceland /]# systemctl start docker
[root@Iceland /]# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
b8dfde127a29: Pull complete
Digest: sha256:7d91b69e04a9029b99f3585aaaccae2baa80bcf318f4a5d2165a9898cd2dc0a1
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon. # 客户连接守护进程
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.(amd64) # pull 镜像
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading. # 通过镜像生成容器运行
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal. # 守护进程将信息显示到终端
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
A informação mais importante acima é para explicar as 4 etapas da operação do docker, até agora o docker também está instalado
[root@Iceland /]# docker version
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:53:39 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:00 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Dicas: Alibaba Cloud Mirror Accelerator
Faça login no Alibaba Cloud --> Container Mirroring Service --> Mirroring Tools --> Mirroring Accelerator e copie e execute os 4 comandos correspondentes ao CentOS.
[root@Iceland /]# sudo mkdir -p /etc/docker # 新建目录
[root@Iceland /]# sudo tee /etc/docker/daemon.json <<-'EOF' # 配置镜像地址文件
> {
> "registry-mirrors": ["https://lisay8ar.mirror.aliyuncs.com"]
> }
> EOF
{
"registry-mirrors": ["https://lisay8ar.mirror.aliyuncs.com"]
}
[root@Iceland /]# sudo systemctl daemon-reload # 重启守护进程
[root@Iceland /]# sudo systemctl restart docker # 重启 docker
Configuração de rede Docker
Entenda a tecnologia de ponte docker0
[root@Iceland /]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo # 本机回环地址
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:16:3e:29:ef:40 brd ff:ff:ff:ff:ff:ff
inet 172.30.31.209/20 brd 172.30.31.255 scope global dynamic noprefixroute eth0 # 阿里云内网地址
valid_lft 315352421sec preferred_lft 315352421sec
inet6 fe80::216:3eff:fe29:ef40/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:5d:e9:e1:b7 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 # docker0 地址
valid_lft forever preferred_lft forever
inet6 fe80::42:5dff:fee9:e1b7/64 scope link
valid_lft forever preferred_lft forever
Cada contêiner no docker se comunica dentro de um segmento de rede fazendo a ponte com o docker0 (semelhante a um roteador). Vale a pena notar que um dos contêineres construirá um par de interfaces virtuais de evth-pair com docker0, que aparecerão em pares e desaparecerão em pares. Só assim podemos garantir que os contêineres sejam independentes uns dos outros e possam se interconectar e se comunicar de maneira eficiente, bem como se comunicar com a rede externa.
teste
[root@Iceland ~]# docker run -d -P --name tomcat01 tomcat # -P表示端口随机映射新建容器运行
Unable to find image 'tomcat:latest' locally
latest: Pulling from library/tomcat
1cfaf5c6f756: Pull complete
c4099a935a96: Pull complete
f6e2960d8365: Pull complete
dffd4e638592: Pull complete
a60431b16af7: Pull complete
4869c4e8de8d: Pull complete
9815a275e5d0: Pull complete
c36aa3d16702: Pull complete
cc2e74b6c3db: Pull complete
1827dd5c8bb0: Pull complete
Digest: sha256:1af502b6fd35c1d4ab6f24dc9bd36b58678a068ff1206c25acc129fb90b2a76a
Status: Downloaded newer image for tomcat:latest
b530e79cc32b45ed6222496013b66ab663eaef74c83dc62610b252b18d1a3310
[root@Iceland ~]# docker exec -it tomcat01 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo # 本机回环地址
valid_lft forever preferred_lft forever
6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0 # eth 桥接地址 6 和 7 一对
valid_lft forever preferred_lft forever
[root@Iceland ~]# ping 172.17.0.2 # 直接可以通过地址从 Linux 命令行 ping 通到容器内部
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.101 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.064 ms
^C
--- 172.17.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2049ms
rtt min/avg/max/mdev = 0.064/0.078/0.101/0.016 ms
[root@Iceland ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:16:3e:29:ef:40 brd ff:ff:ff:ff:ff:ff
inet 172.30.31.209/20 brd 172.30.31.255 scope global dynamic noprefixroute eth0
valid_lft 315312613sec preferred_lft 315312613sec
inet6 fe80::216:3eff:fe29:ef40/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:5d:e9:e1:b7 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:5dff:fee9:e1b7/64 scope link
valid_lft forever preferred_lft forever
7: veth0a09b40@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default # 相较之前IP地址多出的这一个就是和创建容器对应的 7 号桥接地址
link/ether e6:88:6f:4a:e9:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::e488:6fff:fe4a:e94c/64 scope link
valid_lft forever preferred_lft forever
O Docker atribuirá um par de interfaces a cada contêiner para a comunicação entre o contêiner e a ponte. Usando essa tecnologia, os contêineres podem ser isolados uns dos outros e podem se comunicar com eficiência, estabelecendo as bases para a implementação de comunicação do cluster Slurm implantado posteriormente .
Contêineres usam tecnologia de link para se comunicarem
Como o IP do contêiner pode mudar, espera-se que o --link possa ser usado para se comunicar com o ID do contêiner em vez do IP
[root@Iceland ~]# docker run -d -P --name tomcat02 tomcat
07758a3a228c004fbf6cc8092b714d1249f921c4ba9360846206fc7915083f97
[root@Iceland ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
07758a3a228c tomcat "catalina.sh run" 5 seconds ago Up 4 seconds 0.0.0.0:49154->8080/tcp, :::49154->8080/tcp tomcat02
b530e79cc32b tomcat "catalina.sh run" 50 minutes ago Up 50 minutes 0.0.0.0:49153->8080/tcp, :::49153->8080/tcp tomcat01
[root@Iceland ~]# docker exec -it tomcat02 ping tomcat01
3ping: tomcat01: Name or service not known # 发现直接通过容器名在一个容器里无法连接另一个容器
# 通过增加运行时指令 --link 可以解决
[root@Iceland ~]# docker run -d -P --name tomcat03 --link tomcat02 tomcat
6e185946062f3af377eb58c34408471685cca20d8ca0b2873b24514856eda7d8
[root@Iceland ~]# docker exec -it tomcat03 ping tomcat02 # 通过指定03与02连接,发现可以互联
PING tomcat02 (172.17.0.3) 56(84) bytes of data.
64 bytes from tomcat02 (172.17.0.3): icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from tomcat02 (172.17.0.3): icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from tomcat02 (172.17.0.3): icmp_seq=3 ttl=64 time=0.076 ms
^C
--- tomcat02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 53ms
rtt min/avg/max/mdev = 0.076/0.099/0.131/0.024 ms
# 但是反向02却ping不同03,因为双向都需要配置
Consultar as informações da ponte do tomcat03 através do comando –link equivale a adicionar uma linha de mapeamento unidirecional a 02 na configuração dos hosts.
[root@Iceland ~]# docker exec -it tomcat03 cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 tomcat02 07758a3a228c # 这里就绑定了 02
172.17.0.4 6e185946062f
[root@Iceland ~]# docker exec -it tomcat02 cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 07758a3a228c
Também pode ser visto aqui que docker0 é inconveniente. –link é uma definição oficial com limitações e não pode ser personalizada. É impossível vinculá-lo em todas as direções. E docker0 não suporta acesso de conexão de ID de contêiner.
Construção avançada de uma rede personalizada
modo de rede
- bridge: modo bridge docker0 (padrão)
- nenhum: não configura a rede
- host: compartilhe a rede com o host
- container: conectividade de rede do contêiner (muito limitada)
[root@Iceland ~]# docker network --help
Usage: docker network COMMAND
Manage networks
Commands:
connect Connect a container to a network
create Create a network # 通过 create 创建自定义桥接网络
disconnect Disconnect a container from a network
inspect Display detailed information on one or more networks
ls List networks
prune Remove all unused networks
rm Remove one or more networks
Run 'docker network COMMAND --help' for more information on a command.
# docker0 是默认域名不能访问
[root@Iceland ~]# docker rm -f $(docker ps -aq) # 首先将之前的容器及其网络配置删除
6e185946062f
07758a3a228c
b530e79cc32b
[root@Iceland ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root@Iceland ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
tomcat latest 266d1269bb29 10 days ago 668MB
[root@Iceland ~]# ip addr # 可以看到这里已经只有最开始3行网络了
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:16:3e:29:ef:40 brd ff:ff:ff:ff:ff:ff
inet 172.30.31.209/20 brd 172.30.31.255 scope global dynamic noprefixroute eth0
valid_lft 315310384sec preferred_lft 315310384sec
inet6 fe80::216:3eff:fe29:ef40/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:5d:e9:e1:b7 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:5dff:fee9:e1b7/64 scope link
valid_lft forever preferred_lft forever
# 创建新的桥接网络,--driver [网络类型] --subnet [子网范围] --gateway [网关地址]
[root@Iceland ~]# docker network create --driver bridge --subnet 192.168.0.0/16 --gateway 192.168.0.1 mynet
57c914464f0a0e9423483cf16dd5c71dc02c65d02218149e14a3fc169a45ad5e
[root@Iceland ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
9223b334e60a bridge bridge local
8d96801ccaf3 host host local
57c914464f0a mynet bridge local
a5ff794b6d74 none null local
[root@Iceland ~]# docker network inspect mynet
[
{
"Name": "mynet",
"Id": "57c914464f0a0e9423483cf16dd5c71dc02c65d02218149e14a3fc169a45ad5e",
"Created": "2021-08-29T09:07:38.248210817+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {
},
"Config": [
{
"Subnet": "192.168.0.0/16", # 看到网络已经设置好了
"Gateway": "192.168.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
},
"Options": {
},
"Labels": {
}
}
]
Testar rede personalizada
[root@Iceland ~]# docker run -d -P --name tomcat-net-01 --net mynet tomcat
c2e8c4d6ec1af68bea8dcad213a9c693151859667f26336c596aedf4189aa898
[root@Iceland ~]# docker run -d -P --name tomcat-net-02 --net mynet tomcat
91ce2929f0083f0bba803fa12ccf11b1b0cff36b3c807ada42e5fbe1aadef1cb
[root@Iceland ~]# docker network inspect mynet
[
{
"Name": "mynet",
"Id": "57c914464f0a0e9423483cf16dd5c71dc02c65d02218149e14a3fc169a45ad5e",
"Created": "2021-08-29T09:07:38.248210817+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {
},
"Config": [
{
"Subnet": "192.168.0.0/16",
"Gateway": "192.168.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"91ce2929f0083f0bba803fa12ccf11b1b0cff36b3c807ada42e5fbe1aadef1cb": {
"Name": "tomcat-net-02",
"EndpointID": "4df2dc1c5314bb02ae69ef7b47e32e658cb3aaaf7c65074bfddfe38629ba65be",
"MacAddress": "02:42:c0:a8:00:03",
"IPv4Address": "192.168.0.3/16", # 看到这里的IP就是我们定义的192.168.0.3
"IPv6Address": ""
},
"c2e8c4d6ec1af68bea8dcad213a9c693151859667f26336c596aedf4189aa898": {
"Name": "tomcat-net-01",
"EndpointID": "7d92fc552cb88f410b207075e473afde36f63020dc63f0de7923fd7137e19b1f",
"MacAddress": "02:42:c0:a8:00:02",
"IPv4Address": "192.168.0.2/16", # 看到这里的IP就是我们定义的192.168.0.2
"IPv6Address": ""
}
},
"Options": {
},
"Labels": {
}
}
]
A vantagem de uma rede de ponte personalizada é que diferentes redes (diferentes sub-redes) são isoladas umas das outras, mas a interconexão dentro da rede é muito completa e os dois contêineres podem executar ping um no outro, o que corrige o problema --link.
[root@Iceland ~]# docker exec -it tomcat-net-01 ping 192.168.0.3
PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=0.119 ms
64 bytes from 192.168.0.3: icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from 192.168.0.3: icmp_seq=3 ttl=64 time=0.080 ms
^C
--- 192.168.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 64ms
rtt min/avg/max/mdev = 0.080/0.097/0.119/0.016 ms
[root@Iceland ~]# docker exec -it tomcat-net-02 ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.116 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.101 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.102 ms
^C
--- 192.168.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 27ms
rtt min/avg/max/mdev = 0.101/0.106/0.116/0.010 ms
[root@Iceland ~]# docker exec -it tomcat-net-02 ping tomcat-net-01 # 直接通过容器ID也可以
PING tomcat-net-01 (192.168.0.2) 56(84) bytes of data.
64 bytes from tomcat-net-01.mynet (192.168.0.2): icmp_seq=1 ttl=64 time=0.098 ms
64 bytes from tomcat-net-01.mynet (192.168.0.2): icmp_seq=2 ttl=64 time=0.098 ms
64 bytes from tomcat-net-01.mynet (192.168.0.2): icmp_seq=3 ttl=64 time=0.086 ms
^C
--- tomcat-net-01 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 51ms
rtt min/avg/max/mdev = 0.086/0.094/0.098/0.005 ms
Docker Compose - orquestração de contêineres em lote
Documentação oficial: https://docs.docker.com/compose/
O processo original do docker de um único container, docker file --> docker build --> docker run, requer operação manual de um único container, que é estendido diante de um grande número de clusters, então o docker compose realiza a operação automática de vários contêineres por meio de arquivos de configuração
introdução oficial
Usar o Compose é basicamente um processo de três etapas:
- Defina o ambiente do seu aplicativo com um
Dockerfile
para que ele possa ser reproduzido em qualquer lugar. - Defina os serviços que compõem seu aplicativo
docker-compose.yml
para que possam ser executados juntos em um ambiente isolado. - Executar
docker compose up
e o comando de composição do Docker inicia e executa todo o seu aplicativo. Como alternativa, você pode executardocker-compose up
usando o binário docker-compose.
Dois conceitos importantes de composição:
- serviços de serviço, contêineres, aplicativos
- Projeto de projeto, um conjunto de contêineres associados
Etapa 1 - Instalar o Compose
# 先从 GitHub 下载 compose 文件
[root@Iceland ~]# sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
# 给文件授权
[root@Iceland ~]# sudo chmod +x /usr/local/bin/docker-compose
[root@Iceland ~]# docker-compose version # 证明安装成功
docker-compose version 1.29.2, build 5becea4c
docker-py version: 5.0.0
CPython version: 3.7.10
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
Etapa 2 - Exemplo de site oficial
O exemplo do site oficial é um aplicativo python, o contador usa redis, pois o servidor é muito lento e o erro de download não é demonstrado, os passos aproximados são os seguintes:
- Etapa 1: Escreva o aplicativo app.py
- Etapa 2: o aplicativo dockerfile é compactado em uma imagem espelhada (o aplicativo independente não está online)
- etapa 3: arquivo yaml docker-compose (definindo todo o serviço, o ambiente necessário online) - arquivo principal
- Etapa 4: Inicie o projeto de composição (docker-compose up executa todo o conjunto de serviços)
Experimento de construção de cluster Slurm
Blog do blog: https://medium.com/analytics-vidhya/slurm-cluster-with-docker-9f242deee601
Etapa 1 - Descrição da arquitetura Slurm
Criaremos um cluster Slurm usando docker-compose, que nos permite criar um ambiente a partir de uma imagem docker (construída pelo autor). O Docker-compose criará contêineres e redes para se comunicar em ambientes isolados. Cada contêiner é um componente do cluster.
- slurmmaster é um contêiner com slurmctld (daemon de gerenciamento central do Slurm).
- slurmnode[1-3] é um contêiner com slurmd (daemon de nó de computação do Slurm).
- slurmjupyter é um contêiner com jupyterlab. Isso permite usar o JupyterLab como um cliente de cluster para interagir com o cluster. Como usuários finais, usaremos o JupyterLab para interagir com o Slurm por meio do navegador.
- rede cluster_default, o docker-compose criará uma rede para ingressar e manter todos os contêineres. Os contêineres dentro da rede podem ver uns aos outros.
O esquema a seguir mostra como todos os componentes interagem.
Etapa 2 Escreva o arquivo yaml
Como o espelhamento é usado, todo o projeto precisa apenas de um arquivo yaml, que define as etapas de extração de imagens, e você só precisa inserir docker-compose up -d na linha de comando para executar.
# 新建文件夹 cluster 用来存放文件
[root@Iceland ~]# mkdir cluster
[root@Iceland ~]# ls
cluster composetest
[root@Iceland ~]# cd cluster
[root@Iceland cluster]# vim docker-compose.yml
O arquivo docker-compose.yml é o seguinte:
services:
slurmjupyter: # 开通容器 slurmjupyter
image: rancavil/slurm-jupyter:19.05.5-1 # 镜像仓库rancavil是作者名字 Rodrigo Ancavil 缩写
hostname: slurmjupyter
user: admin
volumes:
- shared-vol:/home/admin
ports:
- 8888:8888
slurmmaster:
image: rancavil/slurm-master:19.05.5-1
hostname: slurmmaster
user: admin
volumes:
- shared-vol:/home/admin
ports:
- 6817:6817
- 6818:6818
- 6819:6819
slurmnode1: # 定义容器1的各项参数
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode1
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode1
links:
- slurmmaster # 和之前自定义网络类似,这里定义 node1 能与 master 连接,下面同理
slurmnode2:
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode2
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode2
links:
- slurmmaster
slurmnode3:
image: rancavil/slurm-node:19.05.5-1
hostname: slurmnode3
user: admin
volumes:
- shared-vol:/home/admin
environment:
- SLURM_NODENAME=slurmnode3
links:
- slurmmaster
volumes:
shared-vol:
Etapa 3 - Execute o docker-compose up
[root@Iceland cluster]# docker-compose up -d # 开始部署,接下来是安装步骤
Creating network "cluster_default" with the default driver # docker-compose会自动按yaml生成自定义网络
Creating volume "cluster_shared-vol" with default driver
Pulling slurmjupyter (rancavil/slurm-jupyter:19.05.5-1)...
19.05.5-1: Pulling from rancavil/slurm-jupyter
83ee3a23efb7: Pull complete
db98fc6f11f0: Pull complete
f611acd52c6c: Pull complete
87f6e2c4791b: Pull complete
1301353d4fa3: Pull complete
3347f4fbce33: Pull complete
0cf1a37339f3: Pull complete
e78d0881f8c1: Pull complete
37049fe9d876: Pull complete
a8fa566a7a57: Pull complete
24af49ba4a2f: Pull complete
97b9029f86ee: Pull complete
Digest: sha256:17a72e8e4c5d687359c2923af7166e84f9bd3b63146145421bbac006ce141d45
Status: Downloaded newer image for rancavil/slurm-jupyter:19.05.5-1
Pulling slurmmaster (rancavil/slurm-master:19.05.5-1)...
19.05.5-1: Pulling from rancavil/slurm-master
83ee3a23efb7: Already exists
db98fc6f11f0: Already exists
f611acd52c6c: Already exists
87f6e2c4791b: Already exists
e216e1a311d3: Pull complete
ab998a26ee04: Pull complete
499f3426618c: Pull complete
b5b815649fa6: Pull complete
2f04debb872c: Pull complete
4050a9c6f8d3: Pull complete
Digest: sha256:1979f86166b58213380604dcd7c1fcdb2438a40c44add2ff356be47160a97ab3
Status: Downloaded newer image for rancavil/slurm-master:19.05.5-1
Pulling slurmnode1 (rancavil/slurm-node:19.05.5-1)...
19.05.5-1: Pulling from rancavil/slurm-node
83ee3a23efb7: Already exists
db98fc6f11f0: Already exists
f611acd52c6c: Already exists
87f6e2c4791b: Already exists
d82ef016a552: Pull complete
5865a097296e: Pull complete
0602a8c59a76: Pull complete
6f2545f38103: Pull complete
608c665d03da: Pull complete
c80540692f3b: Pull complete
Digest: sha256:ae650d12fbdaddd29208d7638aa0498c655bfe5a33f4fd07d57e51eb211f18c2
Status: Downloaded newer image for rancavil/slurm-node:19.05.5-1
Creating cluster_slurmmaster_1 ... done
Creating cluster_slurmjupyter_1 ... done
Creating cluster_slurmnode1_1 ... done
Creating cluster_slurmnode2_1 ... done
Creating cluster_slurmnode3_1 ... done
[root@Iceland cluster]# docker-compose ps # 可以看到 5 个容器都运行正常
Name Command State Ports
-------------------------------------------------------------------------------------------------------------
cluster_slurmjupyter_1 /etc/slurm-llnl/docker-ent ... Up 0.0.0.0:8888->8888/tcp,:::8888->8888/tcp
cluster_slurmmaster_1 /etc/slurm-llnl/docker-ent ... Up 3306/tcp,
0.0.0.0:6817->6817/tcp,:::6817->6817/tcp,
0.0.0.0:6818->6818/tcp,:::6818->6818/tcp,
0.0.0.0:6819->6819/tcp,:::6819->6819/tcp
cluster_slurmnode1_1 /etc/slurm-llnl/docker-ent ... Up 6817/tcp, 6818/tcp, 6819/tcp
cluster_slurmnode2_1 /etc/slurm-llnl/docker-ent ... Up 6817/tcp, 6818/tcp, 6819/tcp
cluster_slurmnode3_1 /etc/slurm-llnl/docker-ent ... Up 6817/tcp, 6818/tcp, 6819/tcp
Lembre-se de inserir o endereço IP do servidor: 8888 no navegador para ver a interface do JupyterLab que estamos executando
Esta é a função de extensão de fila Slurm instalada
Clique neste botão para entrar na interface de gerenciamento do Slurm Queue
Clique no botão da linha de comando na página anterior para entrar na visão interna
admin@slurmjupyter:~$ scontrol show node # 查看节点信息,看见3个节点都在
NodeName=slurmnode1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUTot=1 CPULoad=0.31
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=slurmnode1 NodeHostName=slurmnode1 Version=19.05.5
OS=Linux 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021
RealMemory=1 AllocMem=0 FreeMem=141 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=slurmpar
BootTime=2021-08-28T11:15:59 SlurmdStartTime=2021-08-29T06:38:14
CfgTRES=cpu=1,mem=1M,billing=1
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=slurmnode2 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUTot=1 CPULoad=0.31
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=slurmnode2 NodeHostName=slurmnode2 Version=19.05.5
OS=Linux 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021
RealMemory=1 AllocMem=0 FreeMem=141 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=slurmpar
BootTime=2021-08-28T11:16:00 SlurmdStartTime=2021-08-29T06:38:15
CfgTRES=cpu=1,mem=1M,billing=1
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=slurmnode3 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUTot=1 CPULoad=0.31
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=slurmnode3 NodeHostName=slurmnode3 Version=19.05.5
OS=Linux 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021
RealMemory=1 AllocMem=0 FreeMem=141 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=slurmpar
BootTime=2021-08-28T11:16:00 SlurmdStartTime=2021-08-29T06:38:15
CfgTRES=cpu=1,mem=1M,billing=1
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Passo 4 - Execute um exemplo Slurm
Primeiro crie um novo arquivo no JupyterLab e renomeie-o para test.py, e digite o seguinte código - simplesmente deixe o programa dormir por 15s
#!/usr/bin/env python3
import time
import os
import socket
from datetime import datetime as dt
if __name__ == '__main__':
print('Process started {}'.format(dt.now()))
print('NODE : {}'.format(socket.gethostname()))
print('PID : {}'.format(os.getpid()))
print('Executing for 15 secs')
time.sleep(15)
print('Process finished {}\n'.format(dt.now()))
Continue a criar um novo arquivo de script job.sh para atribuir trabalho a todos os slurmnode[1-3], especifique aqui para transmitir test.py e o número de tarefas é 3 e envie o resultado para o arquivo result.out
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=result.out
#
#SBATCH --ntasks=3
#
sbcast -f test.py /tmp/test.py
srun python3 /tmp/test.py
Em seguida, entre na interface de gerenciamento do Slurm Queue, clique em Submit Job para enviar o trabalho que o cluster precisa fazer, aqui basta enviar o arquivo job.sh, selecione o tipo de arquivo, o caminho é /home/admin/job.sh, clique Enviar trabalho.
Lembre-se de clicar em Recarregar novamente para carregar o trabalho no sistema, para que o trabalho comece a ser executado no cluster. Após cerca de 15 segundos, haverá um arquivo de saída adicional de result.out na barra lateral. Clique duas vezes para visualizá-lo é obtido pela computação paralela de 3 nós de computação.
Até agora, a instância do Slurm foi concluída, mas devido ao fato de o servidor gratuito adquirido ter 1 núcleo e 2G, a multiplicação de matrizes mais complexas não pode ser enviada.
Finalmente, lembre-se de fechar o serviço
[root@Iceland cluster]# docker-compose stop
Stopping cluster_slurmnode1_1 ... done
Stopping cluster_slurmnode2_1 ... done
Stopping cluster_slurmnode3_1 ... done
Stopping cluster_slurmjupyter_1 ... done
Stopping cluster_slurmmaster_1 ... done
[root@Iceland cluster]# docker-compose ps
Name Command State Ports
--------------------------------------------------------------------------
cluster_slurmjupyter_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmmaster_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode1_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode2_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode3_1 /etc/slurm-llnl/docker-ent ... Exit 137
填到系统,这样工作就在集群跑起来了,过15s 左右侧边栏就会多出一个 result.out 的输出文件,双击查看就是 3 个计算结点并行计算得到的结果。
[外链图片转存中...(img-mS70KiC7-1630323014459)]
>至此Slurm 的实例完成,受限于购买的免费服务器是1核2G的原因,无法提交更复杂的矩阵乘法。
最后记得关闭服务
```shell
[root@Iceland cluster]# docker-compose stop
Stopping cluster_slurmnode1_1 ... done
Stopping cluster_slurmnode2_1 ... done
Stopping cluster_slurmnode3_1 ... done
Stopping cluster_slurmjupyter_1 ... done
Stopping cluster_slurmmaster_1 ... done
[root@Iceland cluster]# docker-compose ps
Name Command State Ports
--------------------------------------------------------------------------
cluster_slurmjupyter_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmmaster_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode1_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode2_1 /etc/slurm-llnl/docker-ent ... Exit 137
cluster_slurmnode3_1 /etc/slurm-llnl/docker-ent ... Exit 137
Agradecimentos especiais ao vídeo do docker de Kuangshen na estação B, "Contanto que você não morra, aprenda com a morte"