wiki:Workshops/cypress/ManyTaskComputing

Version 3 (modified by fuji, 2 years ago) ( diff )

Many Task Computing

This page introduces examples of scripts for Many-task computing.

Job Array + Many-Task Computing

Cypress job-schedular allows a maximum of 18 concurrently running jobs for normal qos, and 8 jobs for long qos. Even if each job requests a single core, it is counted as one job.

Assuming that we have 100 single-core tasks and each task will run more than 24 hours, you might consider using Job Array and long qos to submit 100 jobs. But Cypress job-schedular allows a maximum of 8 concurrently running jobs.

The example script below submits a Job-Array of 5 array-tasks, and each task run 20 sub-tasks.

#!/bin/bash
#SBATCH --qos=long		# Quality of Service
#SBATCH --job-name=ManyTaskJob  # Job Name
#SBATCH --time=30:00:00		# WallTime
#SBATCH --nodes=1 		# Number of Nodes
#SBATCH --ntasks-per-node=20 	# Number of tasks
#SBATCH --cpus-per-task=1 	# Number of processors per task
#SBATCH --gres=mic:0  		# Number of Co-Processors
#SBATCH --array=0-80:20         # Array of IDs=0,20,40,60,80

# our custom function
cust_func(){
  echo "Do something $1 task"
  sleep 10
}
# For loop $SLURM_NTASKS_PER_NODE times
date
hostname
# $i = 1,2,3,..$SLURM_NTASKS_PER_NODE
for i in $(seq $SLURM_NTASKS_PER_NODE)
do
        # $TASK_ID=1,2,...100
	TASK_ID=$((SLURM_ARRAY_TASK_ID + i))
	cust_func $TASK_ID > log${TASK_ID}.out & # Put a function in the background
done

## Put all cust_func in the background and bash
## would wait until those are completed
## before displaying 'done' message
wait
echo "done"
date

Many MPI jobs in a single job

If you have multiple MPI jobs that must run concurrently, Many-task computing may be the way to go.

The example below requests 6 nodes and puts 3 MPI jobs into the single job.

#!/bin/bash
#SBATCH --partition=defq	# Partition
#SBATCH --qos=normal		# Quality of Service
#SBATCH --job-name=PilotJob   # Job Name
#SBATCH --time=00:10:00		# WallTime
#SBATCH --nodes=6 		# Number of Nodes
#SBATCH --ntasks-per-node=20 	# Number of tasks (MPI presseces)
#SBATCH --cpus-per-task=1 	# Number of processors per task OpenMP threads()
#SBATCH --gres=mic:0  		# Number of Co-Processors

module load intel-psxe

# NUMBER OF SUB-JOBS
NUM_OF_SUBJOBS=3

# Make hostlist
HOSTLIST=${SLURM_JOB_ID}_HOSTS
mpirun hostname -s | sort > ${SLURM_JOB_ID}_HOSTS
#
python jobLauncher.py $NUM_OF_SUBJOBS $HOSTLIST

'jobLauncher.py' manages sub-jobs.

# -*- coding: utf-8 -*-
"""
Created on Fri Jan 22 16:33:57 2016

@author: fuji
"""

import sys
import os
import time
import subprocess

dirname = os.path.dirname(os.path.abspath(sys.argv[0]))
numOfSubJobs = int(sys.argv[1])
nodeFile = sys.argv[2]
#
# Get nodes
with open(nodeFile, 'r') as f:
    nodes = f.readlines()
#
numOfNodes = len(nodes)
#print "N", numOfNodes, nodes
if (numOfNodes < numOfSubJobs) or (numOfSubJobs < 1):
    os.abort()
#
# Dvide processors into numOfJobs
numOfNodesSubJob = [0] *  numOfSubJobs
for id in range(numOfSubJobs):
    numOfNodesSubJob[id] = numOfNodes / numOfSubJobs
    if (numOfNodes % numOfSubJobs != 0):
        if (id < numOfNodes % numOfSubJobs):
            numOfNodesSubJob[id] += 1
    #print "n", numOfNodesSubJob[id]
#
# Allocate Nodes
idx = 0
nodesSubJob = [None] * numOfSubJobs
for id in range(numOfSubJobs):
    nodesSubJob[id] = [None] * numOfNodesSubJob[id]
    for n in range(numOfNodesSubJob[id]):
        nodesSubJob[id][n] = nodes[idx]
        idx += 1
    #print nodesSubJob[id]
#
# Create Nodes Files
nodeFileName = []
for id in range(numOfSubJobs):
    nodeFileName.append("%s_%04d.nod" % (nodeFile,id))
    with open(nodeFileName[id],'wt') as outp:
        for node in nodesSubJob[id]:
            outp.write(node)
#
# Launch SubJobs
proc = []
for id in range(numOfSubJobs):
    commandToLaunchSubJobs = []
    commandToLaunchSubJobs.append(dirname + "/launch_subjob.sh")
    commandToLaunchSubJobs.append(nodeFileName[id])
    #
    #print commandToLaunchSubJobs
    p = subprocess.Popen(commandToLaunchSubJobs,
                         shell=False,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    proc.append(p)

# Wait Until All subjobs done
while(True):
    runningtasks = 0
    for id in range(numOfSubJobs):
        if (proc[id].poll() == None):
            runningtasks += 1
    if runningtasks == 0:
        break
    time.sleep(5) # Checks every 5 seconds
#
# Show outputs
for id in range(numOfSubJobs):
    comm = proc[id].communicate()
    sout = comm[0]
    serr = comm[1]
    #
    print "SubJob", id
    print sout
    print serr

and 'launch_subjob.sh' script launch a sub-job.

#!/bin/bash
export HOST_LIST=$1

# Run Sub-job
mpirun -hostfile $HOST_LIST hostname -s
Note: See TracWiki for help on using the wiki.