[[PageOutline]]
= GNU Parallel =
GNU parallel is a shell tool for executing many small independent tasks on the multi-core platform (compute nodes). A job can be a single command or a small script that has to be run for each of the lines in the input. GNU parallel can then split the input and pipe it into commands in parallel.

== Adding GNU Parallel to Environment ==
To use GNU parallel on the command line,

{{{#!bash
module load parallel
}}}

== GNU Parallel Syntax ==
* Reading commands to be run in parallel from an input file
{{{
  parallel [OPTIONS] < CMDFILE
}}}
* Reading command arguments on the command line
{{{
  parallel [OPTIONS] COMMAND [ARGUMENTS] ::: ARGLIST
}}}
* Reading command arguments from an input file
{{{
  parallel [OPTIONS] COMMAND [ARGUMENTS] :::: ARGFILE
}}}


[https://www.gnu.org/software/parallel/parallel_tutorial.html]

== Many Task Computing ==
For example, you have many scripts to run,
{{{
[fuji@cypress1 JobArray2]$ ls
hello2.py    script01.sh  script03.sh  script05.sh  script07.sh  script09.sh   slurmscript2
script00.sh  script02.sh  script04.sh  script06.sh  script08.sh  slurmscript1
}}}

To run all distributed over 4 cores,
{{{
[fuji@cypress1 JobArray2]$ parallel -j 4 sh ::: script??.sh
}}}

=== Run Gnu Parallel in Slurm script ===
==== Single Node ====
The job script below requests 1 node 20 cores (one whole node). Assuming that each task is multi-thread, and '''OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK''' determines the number of threads. '''"-j $SLURM_NTASKS"''' determines the number of concurrently running tasks.
{{{
#!/bin/bash
#SBATCH --partition=defq	# Partition
#SBATCH --qos=normal		# Quality of Service
#SBATCH --job-name=GNU_Paralle 	# Job Name
#SBATCH --time=00:10:00		# WallTime
#SBATCH --nodes=1 		# Number of Nodes
#SBATCH --ntasks-per-node=5 	# Number of tasks 
#SBATCH --cpus-per-task=4 	# Number of processors per task OpenMP threads()
#SBATCH --gres=mic:0  		# Number of Co-Processors

module load parallel

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

parallel --record-env

parallel --joblog log \
	-j $SLURM_NTASKS \
	--workdir $SLURM_SUBMIT_DIR \
	--env OMP_NUM_THREADS \
	sh ./run_hostname.sh {} ::: `seq 1 100`
}}}

The example task script is:
{{{
#!/bin/bash
hostname
echo $1
sleep 1
}}}

==== Multiple Nodes ====
The job script below requests 4 nodes 20 cores for each. Assuming that each task is multi-thread, and '''OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK''' determines the number of threads. '''"-j $TASKS_PER_NODE"''' determines the number of concurrently running tasks per node, which is defined by '''TASKS_PER_NODE=`echo $SLURM_NTASKS / $SLURM_NNODES | bc`'''. '''scontrol show hostname $SLURM_NODELIST > $MACHINEFILE''' makes a list of nodes and sends it to GNU parallel by "--slf $MACHINEFILE".
{{{
#!/bin/bash
#SBATCH --partition=defq	# Partition
#SBATCH --qos=normal		# Quality of Service
#SBATCH --job-name=GNU_Paralle 	# Job Name
#SBATCH --time=00:10:00		# WallTime
#SBATCH --nodes=4 		# Number of Nodes
#SBATCH --ntasks-per-node=5 	# Number of tasks 
#SBATCH --cpus-per-task=4 	# Number of processors per task OpenMP threads()
#SBATCH --gres=mic:0  		# Number of Co-Processors

module load parallel

MACHINEFILE="machinefile"
scontrol show hostname $SLURM_NODELIST > $MACHINEFILE
cat $MACHINEFILE
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
TASKS_PER_NODE=`echo $SLURM_NTASKS / $SLURM_NNODES | bc`
echo "TASKS_PER_NODE=" $TASKS_PER_NODE

parallel --record-env

parallel --joblog log \
	-j $TASKS_PER_NODE \
	--slf $MACHINEFILE \
	--workdir $SLURM_SUBMIT_DIR \
	--sshdelay 0.1 \
	--env OMP_NUM_THREADS \
	sh ./run_hostname.sh {} ::: `seq 1 100`

echo "took $SECONDS sec"
}}}