slurm.conf of the cluster configuration file in Slurm

1. Introduction to slurm.conf

slurm.conf is an ASCII file that describes general Slurm configuration information, the nodes to be managed, parameters on how those nodes are grouped into partitions, and the various schedules associated with those partitions. This file should be consistent across all nodes in the cluster. The file location environment variable can be modified at execution time by setting the SLURM_CONF environment variable. The Slurm daemon also allows you to override the built-in location using "-f" and the location options provided by the environment.

The contents of the file are case-insensitive, with the exception of node names and partitions. Any text after a "#" in the configuration file will be processed as a comment at the end of the line. Changes to configuration files are made after restarting the Slurm daemon, receiving a SIGHUP signal, or executing the command "scontrol reconfig", unless otherwise noted.

If a line starts with the word "include", followed by a space and then a filename, that file will include the configuration file inline with the current file. For large or complex systems, multiple configuration files may be easier to manage and enable reuse of certain files.

2. Notes on file permissions:

The slurm.conf file must be readable by all users of Slurm because it is used by many Slurm commands. Other files defined in the slurm.conf file, such as log files and job accounting files, may need to be created/owned by user "SlurmUser" in order to be successfully accessed. Use the "chown" and "chmod" commands to set ownership and appropriate permissions.

3. The location of the file

slurm.conf is in the etc folder of the installation directory

Regarding the configuration of slum.conf, you can also first copy the etc file slurm.conf.example in the installation package directory to the installation directory, then rename it to slurm.conf, and modify the parameters.

4. Detailed configuration parameters

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster. #
See the slurm.conf man page for more information.
ControlMachine=#Primary node name
ControlAddr=#Primary node IP
#BackupController =# Backup Node
#BackupAddr=# Backup Node IP
AuthType=auth/munge #Internal Authentication
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=/opt/slurm18/etc /slurm.epilog
#Prolog=/opt/slurm18/etc/slurm.prolog
#EpilogSlurmctld=/opt/slurm18/etc/slurmctld.epilog
#PrologSlurmctld=/opt/slurm18/etc/slurmctld.prolog
#SrunEpilog=
#SrunProlog=
#TaskEpilog=/opt/slurm18/etc/slurm.epilog
#TaskProlog=/opt/slurm18/etc/slurm.prolog
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime= 600
#JobCheckpointDir=/opt/slurm18/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#JobSubmitPlugins=lua #Submit parameter filtering
#KillOnBadExit=0 #Exception job cleaning
#LaunchType=launch/s lurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
MaxJobCount=3000000 #The maximum number of jobs is 3 million
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#MpiParams=ports=12000-12999
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#ProctrackType=proctrack/pgid
ProctrackType=proctrack/linuxpro c# process Tracking plugin
PrologFlags=Alloc
#RebootProgram=
ReturnToService=2 #Disable automatic recovery
#SallocDefaultCommand=
#SlurmctldPidFile=/opt/slurm18/run/slurmctld.pid
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817 # Master control service port
# SlurmdPidFile=/opt/slurm18/run/slurmd.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818 #Compute proxy port
SlurmdSpoolDir=/opt/slurm18/spool/slurmd #Calculate proxy cache
SlurmUser=root #Run user
#SlurmdUser=root #Slurmd run user
StateSaveLocation=/opt/slurm18/spool # slurmctld local file cache
SwitchType= switch/none
TaskPlugin=task/affinity # Resource control method cpuset
#TaskPlugin=task/cgroup # Resource control method cgroups
#TaskPlugin=task/none # No special resource control method
#TaskPluginParam=
TaskPluginParam=sched
#TopologyPlugin=topology/tree#Topology Scheduling tree
#TopologyPlugin=topology/3d_torus #Topology scheduling 3d_torus
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=60 #Check Interval
#HealthCheckProgram=/usr/sbin/nhc #Check Tool
InactiveLimit=0
KillWait=30
MessageTimeout=30
#ResvOverRun=0
MinJobAge= 300 #Complete job retention time
#OverTimeLimit=0
SlurmctldTimeout=30 #Active and standby switching time
SlurmdTimeout=300 #Calculate agent response time
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
# SCHEDULING
#DefMemPerNode=100
#MaxMemPerNode=200
#DefMemPerCPU= 30
#DefMemPerCPU=30
#MaxMemPerCPU=70
FastSchedule=1 #Fast scheduling job
#MaxMemPerCPU=60
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill #Enable backfill scheduling
#SchedulerType=sched/builtin #Enable FIFO scheduling
SchedulerPort=7321 #Scheduler port
SelectType=select/cons_res #Resource Selection Algorithm
SelectTypeParameters=CR_Core #Based on Core Scheduling
#SelectTypeParameters=CR_Core_Memory #Based on Core and Memory Scheduling
#SchedulerParameters=defer,default_queue_depth=50,bf_max_job_test=50
SchedulerParameters=batch_sched_delay=3,defer,sched_min_interval=10,sched_interval= 30, default_queue_depth =100,bf_max_job_test=100,bf_interval=30
# Job priority
#PriorityFlags=
#PriorityType=priority/multifactor #Priority policy
#PriorityDecayHalfLife=30 #Half-life duration
#PriorityCalcPeriod=5 #FS Statistics Interval
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=1000 #FS Weight
#PriorityWeightJobSize= #JobSize Weight
#PriorityWeightPartition=1000 #Partition weight
#PriorityWeightQOS= #QOSweight
# only permitted in slurmdbd.conf
#AccountingStorageType=accounting_storage/none
#AccountingStorageType=accounting_storage/filetxt
#AccountingStorageType=accounting_storage/mysql
AccountingStorageType=accounting_storage/slurmdbd #Enable slurmdbd
AccountingStorageUser=root #Record account service
AccountingStoreJobComment=YES #Record job comment
ClusterName=cluster_gv171 #cluster name
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
AccountingStorageEnforce=associations,limits #Organization association and resource limitation
AccountingStorageHost=#主用客户服务
#AccountingStorageBackupHost=#备用配资服务
#Accounting StorageLoc =/opt/slurm18/accounting/accounting
#AccountingStorageLoc=gv_slurm_db
#AccountingStoragePass=111111
AccountingStoragePort=7031 #Accounting service port
#DebugFlags=NO_CONF_HASH #Debug flag
#JobCompHost=localhost
#JobCompLoc=/opt/slurm18/job_completions /job_completions
#JobCompLoc= gv_slurm_db
#JobCompPass=111111
#JobCompPort=3309
JobCompType=jobcomp/none #Prohibit generating comp logs
#JobCompType=jobcomp/mysql
#JobCompType=jobcomp/filetxt
#JobCompType=jobcomp/slurmdbd
JobCompUser=root
#JobContainerType=job_container/none
JobAcctGatherFrequency=300 #Job collection interval
#JobAcctGatherType =jobacct_gather/none
JobAcctGatherType=jobacct_gather/linux #Enable the Linux plug-in
JobRequeue=1 # Allow re-queuing
SlurmctldDebug=3 #slurmctld log level
SlurmctldLogFile=/opt/slurm18/log/slurmctld.log #Management node log
SlurmdDebug=3 #slurmd log level
SlurmdLogFile=/opt/slurm18/log/slurmd_%h.log #Compute node log
#SlurmdLogFile=/opt/slurm18/log/slurmd.log
PreemptMode=requeue,gang #Preemption strategy
PreemptType=preempt/partition_prio #Queue priority
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate =
SuspendTime=1800
PrivateData=accounts,events,jobs,reservations,usage,users # Authority Control #
GresTypes=gpu,mic
DebugFlags=NO_CONF_HASH
# COMPUTE NODES
NodeName=gv245 CPUs=2 State=IDLE
PartitionName=debug Default=YES PriorityTier=6000 State=UP MaxTime=INFINITE Nodes=ALL ##Can be obtained by command slurmd -C

Guess you like

Origin blog.csdn.net/lovebaby1689/article/details/128683850