wiki:Workshops/cypress/SlurmPractice

Version 9 (modified by fuji, 5 years ago) ( diff )

Work with SLURM on Cypress

If you haven't done yet, download Samples by:

git clone https://hidekiCCS:@bitbucket.org/hidekiCCS/hpc-workshop.git

Introduction to Managed Cluster Computing

On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them.

https://docs.google.com/drawings/d/e/2PACX-1vSlffILDUxxzh_QpD4M7P5-bY_tCkYNjA9xIYWuUUqz_HBBczQ18o5AWA9OZ5_w5Q0bwQJbdgmUCuMJ/pub

Occasionally the manager will be unable to find the resources you need due to usage by other user. In those instances your job will be "queued", that is the manager will wait until the needed resources become available before running your job. This will also occur if the total resources you request for all your jobs exceed the limits set by the cluster administrator. This ensures that all users have equal access to the cluster.

https://docs.google.com/drawings/d/e/2PACX-1vQL7pibkwB5EK2z6d2I9wIu28baQt8Mu3U4FCfwOttWncEwurGa8r-sP2wQxNA1no0j_ik3bVV5s0X8/pub

Serial Job Submission

Under 'workshop' directory,

[fuji@cypress1 ~]$ cd workshop
[fuji@cypress1 workshop]$ ls
BlasLapack  Eigen3        HeatMass    JobArray1  JobDependencies  MPI     PETSc  precision  Python  ScaLapack  SimpleExample  TestCodes  uBLAS
CUDA        FlowInCavity  hybridTest  JobArray2  Matlab           OpenMP  PI     PSE        R       SerialJob  SLU40          TextFiles  VTK

Under 'SerialJob' directory,

[fuji@cypress1 workshop]$ cd SerialJob
[fuji@cypress1 SerialJob]$ ls
hello.py  slurmscript1  slurmscript2

When your code runs on a single core only, your job-script should request a single core. The python code 'hello.py' runs on a single core that is,

# HELLO PYTHON
import datetime
import socket

now = datetime.datetime.now()
print 'Hello, world!'
print now.isoformat()
print socket.gethostname()

Since this runs for a short time, you can try running it on the login node.

[fuji@cypress1 SerialJob]$ python ./hello.py
Hello, world!
2018-08-22T11:46:05.394952
cypress1

This code prints a message, time, and the host name on the screen.

Look at 'slurmscript1'

[fuji@cypress1 SerialJob]$ more slurmscript1
#!/bin/bash
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python hello.py

Notice that the SLURM script begins with #!/bin/bash. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way.

For Bash and Shell Script, see https://en.wikibooks.org/wiki/Bash_Shell_Scripting

In Bash Shell Script, # and the strings after it are comments. So all #SBATCH things in the script above are comments for Bash, but those are directives for SLURM job scheduler.

qos, partition

Those two lines determine the quality of service and the partition.

#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition

The default partition is defq. In defq, you can chose either normal or long for qos.

QOS limits
QOS name maximum job size (node-hours) maximum walltime per job maximum nodes per user
normal N/A 24 hours 18
long 168 168 hours 8

The differences between normal and long are the number of nodes you can request and the duration you can run your code. The details will be explained in Parallel Jobs below.

If you are using a workshop account, you can use only workshop qos and partition.

job-name

#SBATCH --job-name=python       # Job Name

This is the job name that you can specify as you like.

time

#SBATCH --time=00:01:00         # WallTime

The maximum walltime is specified by #SBATCH —time=T, where T has format h:m:s. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not.

Resource Request

#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

The resource request #SBATCH —nodes=N determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job.

#SBATCH —ntasks-per-node=n determines the number of tasks for MPI jobs. The details will be explained in Parallel Jobs below.

#SBATCH —cpus-per-task=c determines the number of cores/threads for a task. The details will be explained in Parallel Jobs below.

This script requests one core on one node.

https://docs.google.com/drawings/d/e/2PACX-1vSlffILDUxxzh_QpD4M7P5-bY_tCkYNjA9xIYWuUUqz_HBBczQ18o5AWA9OZ5_w5Q0bwQJbdgmUCuMJ/pub

There are 124 nodes on Cypress system. Each node has 20 cores.

https://docs.google.com/drawings/d/e/2PACX-1vQR7ztCNSIQhIjyW28FyYaQn92XC4Zq_vZzoPwALkywmXoyRl8qC2MEpT1t68zMopZv2yHNt2unMf-i/pub

Submit a job

Let's run our program on the cluster. To submit our script to SLURM, we invoke the sbatch command.

[fuji@cypress1 SerialJob]$ sbatch slurmscript1
Submitted batch job 773944

Our job was successfully submitted and was assigned the job number 773944. This python code, hello.py prints a message, time, and the host name on the screen. But this time, hello.py ran on one of the computing nodes and your terminal screen doesn't connect to it.

After the job completed, you will see a new file, slurm-???????.out

[fuji@cypress1 SerialJob]$ ls
hello.py  slurm-773944.out  slurmscript1  slurmscript2

that contains

[fuji@cypress1 SerialJob]$ cat slurm-773944.out
Hello, world!
2018-08-22T12:51:34.436170
cypress01-117

The strings supposed to print on screen went to the file, slurm-???????.out. This is a default file name. You can change it by setting,

#SBATCH --output=Hi.out       ### File in which to store job output
#SBATCH --error=Hi.err        ### File in which to store job error messages

Cancel Jobs

Look at slurmscrit2,

[fuji@cypress1 SerialJob]$ cat slurmscript2
#!/bin/bash
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=pythonLong       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python hello.py

sleep 3600

The major difference from slurmscrit1 is the last line sleep 3600, which makes Bash to wait for 3600 seconds at this point.

Submit the job,

[fuji@cypress1 SerialJob]$ sbatch slurmscript2
Submitted batch job 773951

The squeue command gives you a list of jobs running/queued on Cypress. The squeue command also tells us what node our job is being run on. To single out your own job you can use the "-u" option flag to specify your user name.

[fuji@cypress1 SerialJob]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
    773951 worksh         pythonLong     fuji  R       0:07  1 cypress01-117

To stop the job, use scancel command,

[fuji@cypress1 SerialJob]$ scancel 773951
[fuji@cypress1 SerialJob]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)

You will see a new file again.

[fuji@cypress1 SerialJob]$ ls
hello.py  slurm-773944.out  slurm-773951.out  slurmscript1  slurmscript2

The new file contains

[fuji@cypress1 SerialJob]$ cat slurm-773951.out
Hello, world!
2018-08-22T13:17:00.965433
cypress01-117
slurmstepd: error: *** JOB 773951 CANCELLED AT 2018-08-22T13:17:25 ***
Note: See TracWiki for help on using the wiki.