|Version 10 (modified by 4 years ago) ( diff ),|
Work with SLURM on Cypress
If you haven't done yet, download Samples by:
git clone https://hidekiCCS:@bitbucket.org/hidekiCCS/hpc-workshop.git
Introduction to Managed Cluster Computing
On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them.
Occasionally the manager will be unable to find the resources you need due to usage by other user. In those instances your job will be "queued", that is the manager will wait until the needed resources become available before running your job. This will also occur if the total resources you request for all your jobs exceed the limits set by the cluster administrator. This ensures that all users have equal access to the cluster.
Serial Job Submission
Under 'workshop' directory,
[fuji@cypress1 ~]$ cd workshop [fuji@cypress1 workshop]$ ls BlasLapack Eigen3 HeatMass JobArray1 JobDependencies MPI PETSc precision Python ScaLapack SimpleExample TestCodes uBLAS CUDA FlowInCavity hybridTest JobArray2 Matlab OpenMP PI PSE R SerialJob SLU40 TextFiles VTK
Under 'SerialJob' directory,
[fuji@cypress1 workshop]$ cd SerialJob [fuji@cypress1 SerialJob]$ ls hello.py slurmscript1 slurmscript2
When your code runs on a single core only, your job-script should request a single core. The python code 'hello.py' runs on a single core that is,
# HELLO PYTHON import datetime import socket now = datetime.datetime.now() print 'Hello, world!' print now.isoformat() print socket.gethostname()
Since this runs for a short time, you can try running it on the login node.
[fuji@cypress1 SerialJob]$ python ./hello.py Hello, world! 2018-08-22T11:46:05.394952 cypress1
This code prints a message, time, and the host name on the screen.
Look at 'slurmscript1'
[fuji@cypress1 SerialJob]$ more slurmscript1 #!/bin/bash #SBATCH --qos=workshop # Quality of Service #SBATCH --partition=workshop # partition #SBATCH --job-name=python # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load anaconda python hello.py
Notice that the SLURM script begins with #!/bin/bash. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way.
For Bash and Shell Script, see https://en.wikibooks.org/wiki/Bash_Shell_Scripting
In Bash Shell Script, # and the strings after it are comments. So all #SBATCH things in the script above are comments for Bash, but those are directives for SLURM job scheduler.
Those two lines determine the quality of service and the partition.
#SBATCH --qos=workshop # Quality of Service #SBATCH --partition=workshop # partition
The default partition is defq. In defq, you can chose either normal or long for qos.
|QOS name||maximum job size (node-hours)||maximum walltime per job||maximum nodes per user|
The differences between normal and long are the number of nodes you can request and the duration you can run your code. The details will be explained in Parallel Jobs below.
If you are using a workshop account, you can use only workshop qos and partition.
#SBATCH --job-name=python # Job Name
This is the job name that you can specify as you like.
#SBATCH --time=00:01:00 # WallTime
The maximum walltime is specified by #SBATCH —time=T, where T has format h:m:s. Normally, a job is expected to finish before the specified maximum walltime. After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not.
#SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads)
The resource request #SBATCH —nodes=N determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job.
#SBATCH —ntasks-per-node=n determines the number of tasks for MPI jobs. The details will be explained in Parallel Jobs below.
#SBATCH —cpus-per-task=c determines the number of cores/threads for a task. The details will be explained in Parallel Jobs below.
This script requests one core on one node.
There are 124 nodes on Cypress system. Each node has 20 cores.
Submit a job
Let's run our program on the cluster. To submit our script to SLURM, we invoke the sbatch command.
[fuji@cypress1 SerialJob]$ sbatch slurmscript1 Submitted batch job 773944
Our job was successfully submitted and was assigned the job number 773944. This python code, hello.py prints a message, time, and the host name on the screen. But this time, hello.py ran on one of the computing nodes and your terminal screen doesn't connect to it.
After the job completed, you will see a new file, slurm-???????.out
[fuji@cypress1 SerialJob]$ ls hello.py slurm-773944.out slurmscript1 slurmscript2
[fuji@cypress1 SerialJob]$ cat slurm-773944.out Hello, world! 2018-08-22T12:51:34.436170 cypress01-117
The strings supposed to print on screen went to the file, slurm-???????.out. This is a default file name. You can change it by setting,
#SBATCH --output=Hi.out ### File in which to store job output #SBATCH --error=Hi.err ### File in which to store job error messages
Look at slurmscrit2,
[fuji@cypress1 SerialJob]$ cat slurmscript2 #!/bin/bash #SBATCH --qos=workshop # Quality of Service #SBATCH --partition=workshop # partition #SBATCH --job-name=pythonLong # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load anaconda python hello.py sleep 3600
The major difference from slurmscrit1 is the last line sleep 3600, which makes Bash to wait for 3600 seconds at this point.
Submit the job,
[fuji@cypress1 SerialJob]$ sbatch slurmscript2 Submitted batch job 773951
The squeue command gives you a list of jobs running/queued on Cypress. The squeue command also tells us what node our job is being run on. To single out your own job you can use the "-u" option flag to specify your user name.
[fuji@cypress1 SerialJob]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 773951 worksh pythonLong fuji R 0:07 1 cypress01-117
To stop the job, use scancel command,
[fuji@cypress1 SerialJob]$ scancel 773951 [fuji@cypress1 SerialJob]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON)
You will see a new file again.
[fuji@cypress1 SerialJob]$ ls hello.py slurm-773944.out slurm-773951.out slurmscript1 slurmscript2
The new file contains
[fuji@cypress1 SerialJob]$ cat slurm-773951.out Hello, world! 2018-08-22T13:17:00.965433 cypress01-117 slurmstepd: error: *** JOB 773951 CANCELLED AT 2018-08-22T13:17:25 ***