Submitting Jobs on Athene using Slurm (salloc, srun, sbatch, sinfo, squeue)
Summary
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Click here to learn more.
Body
- Upload your code and data to your Athene home directory using Ondemand, Globus, or SCP.
- Choose the type of job you would like to run (salloc, srun, sbatch)
- salloc requests and holds an allocation on the cluster so you can run interactive jobs using srun, mpiexec, and other applications. See srun for more information.
More information execute “man salloc” in a athene-login.hpc.fau.edu terminal.
- Example:
salloc -N 1 –exclusive
srun hostname
- srun requests an allocation on the cluster if one is not currently granted by salloc. It then executes the specified command
More information execute “man srun” in a athene-login.hpc.fau.edu terminal.
- Example:
srun -N 1 –exclusive hostname # execute the hostname of a node
- sbatch executes a task in the background and is not connected to the current terminal. It allocates a task similar to salloc and logs the results. If your computer loses connection to the cluster, sbatch tasks will continue to run making this a very powerful command. An example of a sbatch task is provided below.
-
Create a script named {JOBNAME}.sh to start your job containing the following:
#!/bin/sh
#SBATCH – – partition=shortq7
#SBATCH -N 1
#SBATCH – -exclusive
#SBATCH – – mem-per-cpu=16000
# Load modules, if needed, run staging tasks, etc…
# Execute the task
srun hostname
- Run the command: chmod +x {JOBNAME}.sh to make the job executable.
- Submit the job using the sbatch command.
sbatch {JOBNAME}.sh
- Please adjust partition (queue), application to execute, memory, tasks and the heap sizes as needed for these different examples to create your own. If you need help please let us know by submitting a ticket to the Help Desk.
- You can print a list of queues with the sinfo command
[user@rocky-login011 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
shortq7* up 2:00:00 2 down* node[001,056]
shortq7* up 2:00:00 3 mix node[009,030-031]
shortq7* up 2:00:00 37 alloc node[002-006,008,011-014,019-020,051,054,057-058,062-065,067-081,083-084]
shortq7* up 2:00:00 30 idle gpu-exxact[1-5],gpu-k80,node[007,010,027-029,032,052-053,059-061,082,087-098]
longq7 up 7-12:00:00 2 down* node[001,056]
longq7 up 7-12:00:00 1 mix node009
longq7 up 7-12:00:00 37 alloc node[002-006,008,011-014,019-020,051,054,057-058,062-065,067-081,083-084]
longq7 up 7-12:00:00 6 idle node[007,010,059-061,082]
- You can see the status of the cluster using the squeue command.
[user@rocky-login011 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2552838 longq7 MS1b_m14 usera PD 0:00 1 (Dependency)
2553360 longq7 TS_71-72 userb PD 0:00 2 (Resources)
2553425 longq7 TS_69-70 userb PD 0:00 2 (Resources)
2552836 longq7 MS1b_m14 userc R 4-12:08:47 1 node002
2552837 longq7 MS1b_m14 userc R 5-17:38:36 1 node063
2553116 longq7 Homoseri userd R 7-03:41:54 1 node078
2553117 longq7 Homoseri userd R 7-02:29:54 1 node071
2553157 longq7 HGE_V1_1 usere R 6-07:50:41 1 node003
2553288 longq7 Cys_Ket_ userf R 4-21:02:05 1 node004
...
-
For more information regarding SLURM see the manuals and their quick start guide.
Details
Details
Article ID:
141472
Created
Mon 8/29/22 3:21 PM
Modified
Fri 7/18/25 9:43 AM