slurm_node.conf of the cluster configuration file in Slurm

1. Introduction to slurm_node.conf

        slurm_node.conf is the node configuration file for the Slurm cluster and contains information and settings about the cluster nodes. Each node needs to have a slurm_node.conf file to configure node-specific information. This file is usually located in the /etc directory of the node, but the location of the node configuration file can be specified by setting the NodeName property in the slurm.conf file. 

        The slurm_node.conf file contains many configuration information of the node, such as the name of the node, IP address, architecture type, number of CPU cores, memory capacity, GPU type and number, etc. This information is used by the Slurm manager (slurmctld) to allocate jobs and resources, and to run tasks on nodes. Therefore, it is very important to correctly configure the node information in the slurm_node.conf file to ensure the normal operation of the Slurm cluster.

        Additionally, the slurm_node.conf file can contain other custom properties to specify other specific configurations when running jobs on nodes. For example, you can specify the partition to which the node belongs by setting the PartitionName property in the slurm_node.conf file, and set properties such as the node's idle threshold (IdleProcs) and the maximum number of jobs (MaxJobs). These properties are usually set to further optimize the resource utilization of the Slurm cluster.

2. Slurm_node.conf configuration items

The slurm node configuration file slurm_node.conf contains the configuration information of each node. The following are some common configuration items:

NodeName : The name of the node, which must be unique and the same as the node name in the slurm.conf file.

Sockets : The number of CPU sockets on the node.

CoresPerSocket : The number of cores on each CPU socket.

ThreadsPerCore : Number of threads on each core.

RealMemory : The actual amount of memory on the node, in MB.

State : Node state, usually UNKNOWN, IDLE, MIXED or ALLOCATED.

Weight : The weight of the node, used for scheduling decisions.

PartitionName : The name of the partition to which the node belongs.

Feature : A feature or label on the node, such as CPU model, GPU type, network speed, etc.

IdleProcs : Number of idle processes on the node.

MaxTasksPerNode : The maximum number of tasks that can run concurrently on a node.

Gres : General resources on the node, such as GPU, FPGA, etc.

The above are some common slurm_node.conf configuration items, different cluster and node configurations may be different. It should be noted that after modifying the slurm_node.conf file, the slurmd daemon process needs to be restarted to take effect.

As shown in the figure below, configure information for the configured slurm_node.conf

Guess you like

Origin blog.csdn.net/lovebaby1689/article/details/129882234