[[PageOutline]]
= HPC Workshop Spring 2026 =
= Module 7 of 8 - Job Parallelism (Under construction) =
(Content subject to change prior to the workshop)
== What is Job Parallelism? ==
Job parallelism refers to jobs for which 2 or more processing units (or processors or cores) execute the given job's code instructions simultaneously - rather than sequentially.

Your job may need more than one processor to finish in a timely manner - or possibly to finish at all.

== Why use a HPC Cluster? ==
* '''tasks take too long'''
 * When the task to solve becomes heavy on computations, the operations are typically outsourced from the local laptop or desktop to elsewhere.  
 * Your computation may execute more efficiently if the code supports multithreading or multiprocessing. 
 
* '''one server is not enough'''
 * When a single computer can’t handle the required computation or analysis, the work is carried out on larger groups of servers.

=== Tools for implementing various levels of job parallelism on Cypress ===

See [wiki:IntroToMulti-Processing2025August Module 2 of 8 - Introduction to Multi-processing] for more information on tools available on Cypress for creating and preparing your jobs at the various levels of parallelism - programming, single job, and multi-job.

=== Before running your job ===

Before you run your job you should consider the following in order for your job to run most efficiently on Cypress.

==== Review your software provider's information ====

See [wiki:Workshops/IntroToMulti-Processing2025August#CodesforMulti-CoresMulti-Nodes.Offloading Codes for Multi-Cores, Multi-Nodes. Offloading].

==== Choices for application programming tools ====

 Refer to the following table to determine what programming model to use based on the type of algorithm your job requires.

 ||= Algorithm Type =||= Programming Model =||= Hardware Used =||= Examples =||
 ||Single Instruction Multiple data (SIMD) ||Compiler vectorization       ||Intel Advanced Vector Extensions (AVX), 256-bit vector processor ||See [https://wiki.hpc.tulane.edu/trac/wiki/cypress#MathLibraries Math Libraries] ||
 ||Multithreaded (shared memory) ||OpenMP       ||1 Node, >=2 cores ||See [wiki:cypress/Programming/OpenMp OpenMP] ||
 ||Problem domain decomposition  ||MPI          ||>=2 Nodes ||See [wiki:cypress/Programming/Mpi MPI]||
 ||Massively Parallel, Single Instruction Multiple Threads (SIMT) ||#pragma offload (GPU kernels not available on Cypress)  ||Coprocessors - !XeonPhi (GPUs not available on Cypress)     ||See [wiki:cypress/XeonPhi XeonPhi], [wiki:Workshops/cypress/OffloadingWithOpenMP Offloading to Accelerator] ||
 ||Hybrid Parallel ||MPI + OpenMP ||>=2 Nodes     ||See [wiki:cypress/using#HybridJobs Hybrid Jobs] job script||

==== Choices for Job scripting ====
 * Many independent tasks

 See [wiki:IntroToMulti-Processing2025August#RunningManySerialParallelJobs Running Many Serial/Parallel Jobs] if your computational workload can be split easily - or perhaps with some minimal or one-time effort - into many independent tasks, requiring minimal communication. For more information.

 * Many dependent tasks

 Otherwise, see [wiki:cypress/Programming/Mpi MPI] if your computational workload includes too many tasks to run on a single node '''and''' the tasks require a significant level of inter - communication '''during''' the computation.