Version 2 (modified by fuji, 3 years ago) (diff)

Work with SLURM on Cypress

If you haven't done yet, download Samples by:

svn co file:///home/fuji/repos/workshop ./workshop

Checkout Sample files onto local machine, (linux shell)

svn co svn+ssh:// ./workshop

Introduction to Managed Cluster Computing

On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them.

Occasionally the manager will be unable to find the resources you need due to usage by other user. In those instances your job will be "queued", that is the manager will wait until the needed resources become available before running your job. This will also occur if the total resources you request for all your jobs exceed the limits set by the cluster administrator. This ensures that all users have equal access to the cluster.

Serial Job Submission

Under 'workshop' directory,

[fuji@cypress1 ~]$ cd workshop
[fuji@cypress1 workshop]$ ls
BlasLapack  Eigen3        HeatMass    JobArray1  JobDependencies  MPI     PETSc  precision  Python  ScaLapack  SimpleExample  TestCodes  uBLAS
CUDA        FlowInCavity  hybridTest  JobArray2  Matlab           OpenMP  PI     PSE        R       SerialJob  SLU40          TextFiles  VTK

Under 'SerialJob?' directory,

[fuji@cypress1 workshop]$ cd SerialJob
[fuji@cypress1 SerialJob]$ ls  slurmscript1  slurmscript2

When your code runs on a single core only, your job-script should request a single core. The python code '' runs on a single core that is,

import datetime
import socket

now =
print 'Hello, world!'
print now.isoformat()
print socket.gethostname()

Since this runs for a short time, you can try running it on the login node.

[fuji@cypress1 SerialJob]$ python ./
Hello, world!

This code prints a message, time, and the host name on the screen.

Look at 'slurmscript1'

[fuji@cypress1 SerialJob]$ more slurmscript1
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda

Notice that the SLURM script begins with #!/bin/bash. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way.

For Bash and Shell Script, see

In Bash Shell Script, # and the strings after it are comments. So all #SBATCH things in the script above are comments for Bash, but those are directives for SLURM job scheduler.

qos, partition

Those two lines determine the quality of service and the partition.

#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition

The default partition is defq. In defq, you can chose either normal or long for qos.

QOS limits
QOS name maximum job size (node-hours) maximum walltime per job maximum nodes per user
normal N/A 24 hours 18
long 168 168 hours 8

The differences between normal and long are the number of nodes you can request and duration you can run your code. The details will be explained in Parallel Jobs below.

If you are using a workshop account, you can use only workshop qos and partition.


#SBATCH --job-name=python       # Job Name

This is the job name that you can specify as you like.


#SBATCH --time=00:01:00         # WallTime

The maximum walltime is specified by #SBATCH —time=T, where T has format h:m:s. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not.

Resource Request

#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

The resource request #SBATCH —nodes=N determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job.

#SBATCH —ntasks-per-node=n determines the number of tasks for MPI jobs. The details will be explained in Parallel Jobs below.

#SBATCH —cpus-per-task=c determines the number of cores/threads for a task. The details will be explained in Parallel Jobs below.

This script requests one core on one node.

There are 124 nodes on Cypress system. Each node has 20 cores.

Submit a job

Let's run our program on the cluster. To submit our script to SLURM, we invoke the sbatch command.

[fuji@cypress1 SerialJob]$ sbatch slurmscript1
Submitted batch job 773944

Our job was successfully submitted and was assigned the job number 773944. This python code, prints a message, time, and the host name on the screen. But this time, ran on one of the computing nodes and your terminal screen doesn't connect to it.

After the job completed, you will see a new file, slurm-???????.out

[fuji@cypress1 SerialJob]$ ls  slurm-773944.out  slurmscript1  slurmscript2

that contains

[fuji@cypress1 SerialJob]$ cat slurm-773944.out
Hello, world!

The strings supposed to print on screen went to the file, slurm-???????.out. This is a default file name. You can change it by setting,

#SBATCH --output=Hi.out       ### File in which to store job output
#SBATCH --error=Hi.err        ### File in which to store job error messages