= Work with SLURM on Cypress =
If you haven't done yet, download Samples by:

{{{git clone https://hidekiCCS:@bitbucket.org/hidekiCCS/hpc-workshop.git}}}

== Introduction to Managed Cluster Computing ==
On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them.

[[Image(https://docs.google.com/drawings/d/e/2PACX-1vSlffILDUxxzh_QpD4M7P5-bY_tCkYNjA9xIYWuUUqz_HBBczQ18o5AWA9OZ5_w5Q0bwQJbdgmUCuMJ/pub?w=594&h=209)]]

Occasionally the manager will be unable to find the resources you need due to usage by other user. In those instances your job will be "queued", that is the manager will wait until the needed resources become available before running your job. This will also occur if the total resources you request for all your jobs exceed the limits set by the cluster administrator. This ensures that all users have equal access to the cluster.

[[Image(https://docs.google.com/drawings/d/e/2PACX-1vQL7pibkwB5EK2z6d2I9wIu28baQt8Mu3U4FCfwOttWncEwurGa8r-sP2wQxNA1no0j_ik3bVV5s0X8/pub?w=480&h=360)]]

== Serial Job Submission ==
Under 'workshop' directory,
{{{
[fuji@cypress1 ~]$ cd workshop
[fuji@cypress1 workshop]$ ls
BlasLapack  Eigen3        HeatMass    JobArray1  JobDependencies  MPI     PETSc  precision  Python  ScaLapack  SimpleExample  TestCodes  uBLAS
CUDA        FlowInCavity  hybridTest  JobArray2  Matlab           OpenMP  PI     PSE        R       SerialJob  SLU40          TextFiles  VTK
}}}

Under '!SerialJob' directory,
{{{
[fuji@cypress1 workshop]$ cd SerialJob
[fuji@cypress1 SerialJob]$ ls
hello.py  slurmscript1  slurmscript2
}}}

When your code runs on a single core only, your job-script should request a single core.  The python code 'hello.py' runs on a single core that is,
{{{#!python
# HELLO PYTHON
import datetime
import socket

now = datetime.datetime.now()
print 'Hello, world!'
print now.isoformat()
print socket.gethostname()
}}}

Since this runs for a short time, you can try running it on the login node.
{{{
[fuji@cypress1 SerialJob]$ python ./hello.py
Hello, world!
2018-08-22T11:46:05.394952
cypress1
}}}
This code prints a message, time, and the host name on the screen.

Look at 'slurmscript1'
{{{
[fuji@cypress1 SerialJob]$ more slurmscript1
#!/bin/bash
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python hello.py
}}}

Notice that the SLURM script begins with '''#!/bin/bash'''. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). 
The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way. 

For Bash and Shell Script, see
[https://en.wikibooks.org/wiki/Bash_Shell_Scripting]

In Bash Shell Script, '''#''' and the strings after it are comments.
So all '''#SBATCH''' things in the script above are comments for Bash, 
but those are directives for '''SLURM''' job scheduler.

==== qos, partition ====
Those two lines determine the quality of service and the partition.
{{{
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
}}}
The default partition is '''defq'''. In '''defq''', you can chose either '''normal''' or '''long''' for '''qos'''.
||||||||= '''QOS limits''' =||
|| '''QOS name''' || '''maximum job size (node-hours)''' || '''maximum walltime per job''' || '''maximum nodes per user''' ||
|| normal      || N/A ||24 hours || 18 ||
|| long        || 168 ||168 hours ||  8 ||

The differences between '''normal''' and '''long''' are the number of nodes you can request and  the duration you can run your code.
The details will be explained in Parallel Jobs below.

If you are using a workshop account, you can use only '''workshop''' qos and partition.

==== job-name ====
{{{
#SBATCH --job-name=python       # Job Name
}}}
This is the job name that you can specify as you like.

==== time ====
{{{
#SBATCH --time=00:01:00         # WallTime
}}}
The maximum walltime is specified by #SBATCH --time=T, where T has format h:m:s.  
Normally, a job is expected to finish before the specified maximum walltime.  
After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. 

==== Resource Request ====
{{{
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
}}}

The resource request '''#SBATCH --nodes=N''' determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job.  

'''#SBATCH --ntasks-per-node=n''' determines the number of tasks for MPI jobs. The details will be explained in Parallel Jobs below.

'''#SBATCH --cpus-per-task=c'''  determines the number of cores/threads for a task. The details will be explained in Parallel Jobs below.

This script requests one core on one node.

[[Image(https://docs.google.com/drawings/d/e/2PACX-1vSlffILDUxxzh_QpD4M7P5-bY_tCkYNjA9xIYWuUUqz_HBBczQ18o5AWA9OZ5_w5Q0bwQJbdgmUCuMJ/pub?w=594&h=209)]]

There are 124 nodes on Cypress system. Each node has 20 cores.

[[Image(https://docs.google.com/drawings/d/e/2PACX-1vQR7ztCNSIQhIjyW28FyYaQn92XC4Zq_vZzoPwALkywmXoyRl8qC2MEpT1t68zMopZv2yHNt2unMf-i/pub?w=155&h=134)]]

=== Submit a job ===
Let's run our program on the cluster. 
To submit our script to SLURM, we invoke the '''sbatch''' command. 
{{{
[fuji@cypress1 SerialJob]$ sbatch slurmscript1
Submitted batch job 773944
}}}

Our job was successfully submitted and was assigned the job number 773944.
This python code, ''hello.py'' prints a message, time, and the host name on the screen.
But this time,  ''hello.py'' ran on one of the computing nodes and your terminal screen doesn't connect to it.

After the job completed, you will see a new file, slurm-???????.out
{{{
[fuji@cypress1 SerialJob]$ ls
hello.py  slurm-773944.out  slurmscript1  slurmscript2
}}}
that contains
{{{
[fuji@cypress1 SerialJob]$ cat slurm-773944.out
Hello, world!
2018-08-22T12:51:34.436170
cypress01-117
}}}
The strings supposed to print on screen went to the file, slurm-???????.out. This is a default file name. You can change it by setting,
{{{
#SBATCH --output=Hi.out       ### File in which to store job output
#SBATCH --error=Hi.err        ### File in which to store job error messages
}}}

=== Cancel Jobs ===
Look at '''slurmscrit2''',
{{{
[fuji@cypress1 SerialJob]$ cat slurmscript2
#!/bin/bash
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=pythonLong       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python hello.py

sleep 3600
}}}

The major difference from '''slurmscrit1''' is the last line '''sleep 3600''', which makes Bash to wait for 3600 seconds at this point.

Submit the job,
{{{
[fuji@cypress1 SerialJob]$ sbatch slurmscript2
Submitted batch job 773951
}}}

The '''squeue''' command gives you a list of jobs running/queued on Cypress.
The '''squeue''' command also tells us what node our job is being run on. 
To single out your own job you can use the "-u" option flag to specify your user name.

{{{
[fuji@cypress1 SerialJob]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
    773951 worksh         pythonLong     fuji  R       0:07  1 cypress01-117
}}}

To stop the job, use '''scancel''' command,
{{{
[fuji@cypress1 SerialJob]$ scancel 773951
[fuji@cypress1 SerialJob]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
}}}

You will see a new file again.
{{{
[fuji@cypress1 SerialJob]$ ls
hello.py  slurm-773944.out  slurm-773951.out  slurmscript1  slurmscript2
}}}

The new file contains
{{{
[fuji@cypress1 SerialJob]$ cat slurm-773951.out
Hello, world!
2018-08-22T13:17:00.965433
cypress01-117
slurmstepd: error: *** JOB 773951 CANCELLED AT 2018-08-22T13:17:25 ***
}}}