How to install and deploy slurm cluster under centos

In the previous talk about the Slurm job scheduling system_boy Li's blog-CSDN blog, we gave a brief overview of the Slurm scheduling system. Here, we will focus on how to install and deploy the Slurm cluster under centos.

operating system IP configuration server
centos7.6 192.168.1.1 CPU: 2GHz*2, Memory: 4GB, Disk: 17GB management node
centos7.6 192.168.1.2 CPU: 2GHz*2, Memory: 4GB, Disk: 17GB calculate node

1. Basic environment preparation

1.1 Configure ssh login without password

Configure host A to log in to host B without password (method 1)    

(1) Go to my home directory cd ~/.ssh  

(2) ssh-keygen (four carriage returns), will generate two files id_rsa (private key), id_rsa.pub (public key)  

(3) Copy the public key to the machine to avoid login: ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

(Method Two)  

(1) Generate a key pair on host A: ssh-keygen -t rsa, a key file will be generated in the .ssh directory  

(2) Copy the public key of host A to host B: scp /root/.ssh/id_rsa.pub B:/root/.ssh/    

(3) Add the public key of host A to the authorization list of host B. ssh/authorized_keys (if it does not exist, create it manually): cat id_rsa.pub >> authorized_keys    

(4) The permission of the authorization list authorized_keys must be 600, chmod 600 authorized_keys

1.2 Configure NTP time synchronization 

(1) Install ntp on all nodes in the cluster,

yum install ntp    

(2) All nodes set the time zone, here is set to the time used in China

timedatectl set-timezone Asia/Shanghai  

(3) Start the ntp service on the server node   

  systemctl start ntpd     systemctl enable ntpd    

(4) Set the current accurate time on the server node   

 timedatectl set-time HH:MM:SS    

(5) Set its ntp server as itself on the server node, and set a client that can accept connection services at the same time. Modify /etc/ntp.conf, add two lines: restrict 127.0.0.1 and server 127.127.1.0    

(6) Restart the ntpd service     

systemctl restart ntpd    

(7) Set the ntp server as the server node on the client node. Modify /etc/ntp.conf, add a line: server ip address (replace with server IP)    

(8) Synchronize the server time on the client node, ntpdate ip address (replaced with server IP)    

(9) The client node starts the ntpd service  

   systemctl start ntpd     systemctl enable ntpd    

(10) All nodes start time synchronization

timedatectl set-ntp yes

1.3 Turn off the firewall

In order to prevent the mysql and slurm ports from being blocked by the firewall, you can add a firewall exception to the port or directly close the firewall. The firewall related commands are as follows

enable firewall command

systemctl start firewalld.service

View firewall status

systemctl status firewalld.service

turn off firewall command

systemctl stop firewalld.service

Start the firewall at boot

systemctl enable firewalld.service

Disable firewall at boot

systemctl disable firewalld.service

1.4 Update system

yum update

2. Install and configure Munge

2.1 install munge     

MUNGE (MUNGE Uid 'N' Gid Emporium) is an authentication service for creating and verifying credentials. It allows a process to verify the UID and GID of another local or remote process in a set of hosts with a common user and group.

yum install munge munge-libs munge-devel

2.2 Configure Munge

Configure the Munge key: Generate a Munge key and set permissions using the following commands:

sudo /usr/sbin/create-munge-key
sudo chown munge: /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key

2.3 Start the Munge service

Use the following commands to start the Munge service and set it to start automatically on boot:

sudo systemctl start munge
sudo systemctl enable munge

3. Install Slurm

3.1 Download and install

3.1 Download the Slurm package:

From the official Slurm website Slurm Workload Manager - Download Slurm (schedmd.com) 

Download the Slurm package and upload it to the target node.

tar --bzip -x -f slurm*tar.bz2
cd slurm-*
./configure
make && make install

3.2 Install pmi, pmi2

 Enter contribs/pmi in the slurm installation package directory   

cd contribs/pmi
 
make&& make install     

 Enter the slurm installation package directory

cd contribs/pmi2     
make&& make install

3.3 Prepare working directory

 mkdir  /opt/slurm/etc  

 The configuration file storage directory, the default is the etc folder under the installation directory 

 mkdir  /opt/slurm/spool  

The Slurm state will be saved in this directory to recover from system failures. The default path is "/var/spool/", which is newly created here because this directory is specified in the slurm.conf file

 mkdir  /opt/slurm/log  

The log storage directory, the default directory is /var/log, which is created here because the directory is specified in the slurm.conf file

Fourth, the configuration of Slurm

In the installation directory /opt/slurm/etc, create configuration files slurm.conf and slurmdbd.conf

For configuration related to Slurm, please refer to:

​​​​​​​​​​​​​​​​​​The slurm.conf_slurm configuration file of the cluster configuration file in Slurm_Boy Li's Blog-CSDN Blog

slurmdbd.conf of the cluster configuration file in Slurm - Programmer Sought

slurm_node.conf of the cluster configuration file in Slurm - Programmer Sought

Slurm Workload Manager - Quick Start Administrator Guide (schedmd.com)

Or modify slurm.conf.example and slurmdbd.conf.example in the etc directory of the source code

Next, copy the service script in the etc directory of the installation package to the startup directory /usr/lib/systemd/system, and add execution permissions

start service

Start the slurmdbd service

systemctl restart slurmdbd.service

Start the slurm scheduling service     

systemctl restart slurmctld.service  
systemctl restart slurmd.service

Guess you like

Origin blog.csdn.net/lovebaby1689/article/details/130237067