In the previous talk about the Slurm job scheduling system_boy Li's blog-CSDN blog, we gave a brief overview of the Slurm scheduling system. Here, we will focus on how to install and deploy the Slurm cluster under centos.
operating system | IP | configuration | server |
centos7.6 | 192.168.1.1 | CPU: 2GHz*2, Memory: 4GB, Disk: 17GB | management node |
centos7.6 | 192.168.1.2 | CPU: 2GHz*2, Memory: 4GB, Disk: 17GB | calculate node |
1. Basic environment preparation
1.1 Configure ssh login without password
Configure host A to log in to host B without password (method 1)
(1) Go to my home directory cd ~/.ssh
(2) ssh-keygen (four carriage returns), will generate two files id_rsa (private key), id_rsa.pub (public key)
(3) Copy the public key to the machine to avoid login: ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
(Method Two)
(1) Generate a key pair on host A: ssh-keygen -t rsa, a key file will be generated in the .ssh directory
(2) Copy the public key of host A to host B: scp /root/.ssh/id_rsa.pub B:/root/.ssh/
(3) Add the public key of host A to the authorization list of host B. ssh/authorized_keys (if it does not exist, create it manually): cat id_rsa.pub >> authorized_keys
(4) The permission of the authorization list authorized_keys must be 600, chmod 600 authorized_keys
1.2 Configure NTP time synchronization
(1) Install ntp on all nodes in the cluster,
yum install ntp
(2) All nodes set the time zone, here is set to the time used in China
timedatectl set-timezone Asia/Shanghai
(3) Start the ntp service on the server node
systemctl start ntpd systemctl enable ntpd
(4) Set the current accurate time on the server node
timedatectl set-time HH:MM:SS
(5) Set its ntp server as itself on the server node, and set a client that can accept connection services at the same time. Modify /etc/ntp.conf, add two lines: restrict 127.0.0.1 and server 127.127.1.0
(6) Restart the ntpd service
systemctl restart ntpd
(7) Set the ntp server as the server node on the client node. Modify /etc/ntp.conf, add a line: server ip address (replace with server IP)
(8) Synchronize the server time on the client node, ntpdate ip address (replaced with server IP)
(9) The client node starts the ntpd service
systemctl start ntpd systemctl enable ntpd
(10) All nodes start time synchronization
timedatectl set-ntp yes
1.3 Turn off the firewall
In order to prevent the mysql and slurm ports from being blocked by the firewall, you can add a firewall exception to the port or directly close the firewall. The firewall related commands are as follows
enable firewall command
systemctl start firewalld.service
View firewall status
systemctl status firewalld.service
turn off firewall command
systemctl stop firewalld.service
Start the firewall at boot
systemctl enable firewalld.service
Disable firewall at boot
systemctl disable firewalld.service
1.4 Update system
yum update
2. Install and configure Munge
2.1 install munge
MUNGE (MUNGE Uid 'N' Gid Emporium) is an authentication service for creating and verifying credentials. It allows a process to verify the UID and GID of another local or remote process in a set of hosts with a common user and group.
yum install munge munge-libs munge-devel
2.2 Configure Munge
Configure the Munge key: Generate a Munge key and set permissions using the following commands:
sudo /usr/sbin/create-munge-key
sudo chown munge: /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
2.3 Start the Munge service
Use the following commands to start the Munge service and set it to start automatically on boot:
sudo systemctl start munge
sudo systemctl enable munge
3. Install Slurm
3.1 Download and install
3.1 Download the Slurm package:
From the official Slurm website Slurm Workload Manager - Download Slurm (schedmd.com)
Download the Slurm package and upload it to the target node.
tar --bzip -x -f slurm*tar.bz2
cd slurm-*
./configure
make && make install
3.2 Install pmi, pmi2
Enter contribs/pmi in the slurm installation package directory
cd contribs/pmi
make&& make install
Enter the slurm installation package directory
cd contribs/pmi2
make&& make install
3.3 Prepare working directory
mkdir /opt/slurm/etc
The configuration file storage directory, the default is the etc folder under the installation directory
mkdir /opt/slurm/spool
The Slurm state will be saved in this directory to recover from system failures. The default path is "/var/spool/", which is newly created here because this directory is specified in the slurm.conf file
mkdir /opt/slurm/log
The log storage directory, the default directory is /var/log, which is created here because the directory is specified in the slurm.conf file
Fourth, the configuration of Slurm
In the installation directory /opt/slurm/etc, create configuration files slurm.conf and slurmdbd.conf
For configuration related to Slurm, please refer to:
slurmdbd.conf of the cluster configuration file in Slurm - Programmer Sought
slurm_node.conf of the cluster configuration file in Slurm - Programmer Sought
Slurm Workload Manager - Quick Start Administrator Guide (schedmd.com)
Or modify slurm.conf.example and slurmdbd.conf.example in the etc directory of the source code
Next, copy the service script in the etc directory of the installation package to the startup directory /usr/lib/systemd/system, and add execution permissions
start service
Start the slurmdbd service
systemctl restart slurmdbd.service
Start the slurm scheduling service
systemctl restart slurmctld.service
systemctl restart slurmd.service